I'm reprocessing a bunch of data ~45 days ~70GB per day. It's taking a
while...is there some configuration that might help demux perform better when
it's fed a lot of files? I've noticed sort takes a long time when it's got too
many maps. Can I lower the amount of maps, etc...?
I saw this in the config but noticed the TODO comments. Anything here I should
configure?
<!-- Chukwa Job parameters -->
<property>
<name>io.sort.mb</name>
<value>@TODO-DEMUX-IO-SORT-MB@</value>
<description>The total amount of buffer memory to use while sorting
files, in megabytes. By default, gives each merge stream 1MB, which
should minimize seeks.</description>
</property>
<property>
<name>fs.inmemory.size.mb</name>
<value>@TODO-DEMUX-FS-INMEMORY-SIZE_MB@</value>
<description>The size of the in-memory filsystem instance in
MB</description>
</property>
<property>
<name>io.sort.factor</name>
<value>@TODO-DEMUX-IO-SORT-FACTOR@</value>
<description>The number of streams to merge at once while sorting
files. This determines the number of open file handles.</description>
</property>