Mykhailo Kozik created CAMEL-13399:
--------------------------------------

             Summary: ZipAggregationStrategy become slower when size of zip 
grows
                 Key: CAMEL-13399
                 URL: https://issues.apache.org/jira/browse/CAMEL-13399
             Project: Camel
          Issue Type: Bug
          Components: camel-zipfile
    Affects Versions: 2.23.1
            Reporter: Mykhailo Kozik
         Attachments: Screenshot 2019-04-08 18.41.10.png

I have a simple route which runs by demand and archives multiple files in one 
zip archive.
{code:java}
from(file:/path/to/source)
.aggregate(constant(1), new ZipAggregationsStrategy(true, true))
.completionFromBatchConsumer()
.eagerCheckCompletion()
.to(file:/path/to/target){code}
It works fine when the number of files in source folder is relatively small.

After adding tracing logs to test size of input files / time taken by process, 
the following chart could be drawn. 

!Screenshot 2019-04-08 18.41.10.png!



That means, to make zip archive from 500mb of files takes over 12 minutes!

Looks like in order to add a file, camel extracts zip archive to input stream, 
put file inside it, and build zip archive again. So that becomes near quadratic 
complexity, and not acceptable for large folders.

The workaround is to add completionSize or completionPredicate to flush every 
100mb, so we got all files archived but splitted into several archives, which 
works but not the best choice.

 

Is there a general solution how to make ZipAggregationStrategy to work in near 
linear time, so the process does not become slower with large number of times?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to