thadguidry opened a new issue, #2235:
URL: https://github.com/apache/hop/issues/2235

   ### Apache Hop version?
   
   2.3.0
   
   ### Java version?
   
   17.0.4.1
   
   ### Operating system
   
   Windows
   
   ### What happened?
   
   Wire up a simple workflow of
   Start -> unzip -> Success
   
   The .zip file is read from disk, and file contents are written correctly to 
target folder on same disk.
   The disk is a RAID 1 array with 7200 RPM HDD's.
   
   The single large 1.2GB .zip file has taken over 4 hours to unzip so 
far...and I am still waiting.
   Conversely on a different machine with similar HDD configuration, using 7Zip 
64bit the extraction was completed in about 40 minutes.
   
   Looking at the code, I am wondering if the option for `if file exists` = 
SKIP is perhaps a likely suspect (where perhaps my strategy is to go as fast as 
possible and I don't care about checking if things exist are not across over 
155,000 files, but just blast through and overwrite files, no matter what) ?  
I.E. maybe I should have chosen `if file exists` = OVERWRITE ?  I don't know 
which settings would make zip file unpacking faster or the HopVfs streaming 
faster?  Maybe there should have been a setting in the unzip dialog that 
allowed me to use more memory or cores? 
   
   The other suspicion is that of the buffering and low amount of memory that 
is utilized in order to unpack ?
   Looking at Task Manager on Windows, I can see Hop stayed at 50% CPU 
utilization across all 8 of my i7-9700k cores.  And where memory utilized was 
at 2.5GB.
   
   Can we do better?  Of course.  A general strategy perhaps:
   
   1. The question is how can the HOP VFS architecture deal with unzipping 
faster and optionally use more of the hardware?
   2. Maybe perhaps warn users of issues with a many-file .zip file that needs 
to be extracted for now, and with a note to perhaps use other unzip tooling 
when you are dealing with large, deep zip files and need to go very fast?
   3. With a final goal to eventually make things faster with Hop's own unzip?
   
   
   ### Issue Priority
   
   Priority: 3
   
   ### Issue Component
   
   Component: VFS


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to