The number of done files to load is currently not adjustable.  Please file a
jira for this enhancement.

Default.properties is design to generate chukwa-demux-conf.xml during build
time.  Since your system is already deployed, you should modify
chukwa-demux-conf.xml and replace with the values manually.

Try these settings:

Io.sort.mb 128
Fs.inmemory.size.mb 128
Io.sort.factor 10

Hope this helps.

Regards,
Eric

On 5/3/10 8:21 AM, "Corbin Hoenes" <cor...@tynt.com> wrote:

> Eric,
> 
> We are reprocessing from archive files.   We've played with how many files to
> move into the /chukwa/logs directory to get better speed and control the
> number of mappers the job uses--I wondered if this was configurable (how many
> .done files does the demux job look for before starting a job).  For now we
> just move 40 new ones in when there are less than 10 in the logs dir.
> 
> Our reducer is just the default reducer.  That we get from AbstractProcessor.
> 
> On the config file...I pasted the literal file from my setup.  So what is the
> <value>@TODO-DEMUX-IO-SORT-MB@</value> supposed to do?
> 
> Should I replace that value with my own value like 128 for 128MB?  Or should I
> be replacing the default.properties value if I wanted to change this value?
> 
> 
> On May 2, 2010, at 11:57 AM, Eric Yang wrote:
> 
>> Are you reprocessing from archive file?  The number of mappers are mapped to
>> the number of files that you have.  Hence, having small files would surely
>> slow things down quite a bit.  The parameters are also depending on your
>> hardware.  The default settings in default.properties are used to generate
>> chukwa-demux-conf.xml.  It was setup for 4GB machine for your task trackers.
>> You may want to increase the numbers, if you have more ram.
>> 
>> I hope your reducers don't write out a lot of data, this is currently
>> partitioned by data type.  Hence, it may take a long time to write the final
>> output, if the reducers need to output TB of data.  I filed a jira for
>> improving this a while ago:
>> 
>> https://issues.apache.org/jira/browse/CHUKWA-481
>> 
>> Hope this helps.
>> 
>> Regards,
>> Eric
>> 
>> On 5/2/10 6:51 AM, "Corbin Hoenes" <cor...@tynt.com> wrote:
>> 
>>> I'm reprocessing a bunch of data ~45 days ~70GB per day.  It's taking a
>>> while...is there some configuration that might help demux perform better
>>> when
>>> it's fed a lot of files?  I've noticed sort takes a long time when it's got
>>> too many maps.  Can I lower the amount of maps, etc...?
>>> 
>>> I saw this in the config but noticed the TODO comments.  Anything here I
>>> should configure?
>>> 
>>> <!-- Chukwa Job parameters -->
>>> <property>
>>> <name>io.sort.mb</name>
>>> <value>@TODO-DEMUX-IO-SORT-MB@</value>
>>> <description>The total amount of buffer memory to use while sorting
>>> files, in megabytes.  By default, gives each merge stream 1MB, which
>>> should minimize seeks.</description>
>>> </property>
>>> 
>>> <property>
>>> <name>fs.inmemory.size.mb</name>
>>> <value>@TODO-DEMUX-FS-INMEMORY-SIZE_MB@</value>
>>> <description>The size of the in-memory filsystem instance in
>>> MB</description>
>>> </property>
>>> 
>>> <property>
>>> <name>io.sort.factor</name>
>>> <value>@TODO-DEMUX-IO-SORT-FACTOR@</value>
>>> <description>The number of streams to merge at once while sorting
>>> files.  This determines the number of open file handles.</description>
>>> </property>
>>> 
>> 
> 

Reply via email to