I'm reprocessing a bunch of data ~45 days ~70GB per day.  It's taking a 
while...is there some configuration that might help demux perform better when 
it's fed a lot of files?  I've noticed sort takes a long time when it's got too 
many maps.  Can I lower the amount of maps, etc...?

I saw this in the config but noticed the TODO comments.  Anything here I should 
configure?

<!-- Chukwa Job parameters -->
        <property>
          <name>io.sort.mb</name>
          <value>@TODO-DEMUX-IO-SORT-MB@</value>
          <description>The total amount of buffer memory to use while sorting
          files, in megabytes.  By default, gives each merge stream 1MB, which
          should minimize seeks.</description>
        </property>

        <property>
          <name>fs.inmemory.size.mb</name>
          <value>@TODO-DEMUX-FS-INMEMORY-SIZE_MB@</value>
          <description>The size of the in-memory filsystem instance in 
MB</description>
        </property>

        <property>
          <name>io.sort.factor</name>
          <value>@TODO-DEMUX-IO-SORT-FACTOR@</value>
          <description>The number of streams to merge at once while sorting
          files.  This determines the number of open file handles.</description>
        </property>

Reply via email to