The number of done files to load is currently not adjustable. Please file a jira for this enhancement.
Default.properties is design to generate chukwa-demux-conf.xml during build time. Since your system is already deployed, you should modify chukwa-demux-conf.xml and replace with the values manually. Try these settings: Io.sort.mb 128 Fs.inmemory.size.mb 128 Io.sort.factor 10 Hope this helps. Regards, Eric On 5/3/10 8:21 AM, "Corbin Hoenes" <cor...@tynt.com> wrote: > Eric, > > We are reprocessing from archive files. We've played with how many files to > move into the /chukwa/logs directory to get better speed and control the > number of mappers the job uses--I wondered if this was configurable (how many > .done files does the demux job look for before starting a job). For now we > just move 40 new ones in when there are less than 10 in the logs dir. > > Our reducer is just the default reducer. That we get from AbstractProcessor. > > On the config file...I pasted the literal file from my setup. So what is the > <value>@TODO-DEMUX-IO-SORT-MB@</value> supposed to do? > > Should I replace that value with my own value like 128 for 128MB? Or should I > be replacing the default.properties value if I wanted to change this value? > > > On May 2, 2010, at 11:57 AM, Eric Yang wrote: > >> Are you reprocessing from archive file? The number of mappers are mapped to >> the number of files that you have. Hence, having small files would surely >> slow things down quite a bit. The parameters are also depending on your >> hardware. The default settings in default.properties are used to generate >> chukwa-demux-conf.xml. It was setup for 4GB machine for your task trackers. >> You may want to increase the numbers, if you have more ram. >> >> I hope your reducers don't write out a lot of data, this is currently >> partitioned by data type. Hence, it may take a long time to write the final >> output, if the reducers need to output TB of data. I filed a jira for >> improving this a while ago: >> >> https://issues.apache.org/jira/browse/CHUKWA-481 >> >> Hope this helps. >> >> Regards, >> Eric >> >> On 5/2/10 6:51 AM, "Corbin Hoenes" <cor...@tynt.com> wrote: >> >>> I'm reprocessing a bunch of data ~45 days ~70GB per day. It's taking a >>> while...is there some configuration that might help demux perform better >>> when >>> it's fed a lot of files? I've noticed sort takes a long time when it's got >>> too many maps. Can I lower the amount of maps, etc...? >>> >>> I saw this in the config but noticed the TODO comments. Anything here I >>> should configure? >>> >>> <!-- Chukwa Job parameters --> >>> <property> >>> <name>io.sort.mb</name> >>> <value>@TODO-DEMUX-IO-SORT-MB@</value> >>> <description>The total amount of buffer memory to use while sorting >>> files, in megabytes. By default, gives each merge stream 1MB, which >>> should minimize seeks.</description> >>> </property> >>> >>> <property> >>> <name>fs.inmemory.size.mb</name> >>> <value>@TODO-DEMUX-FS-INMEMORY-SIZE_MB@</value> >>> <description>The size of the in-memory filsystem instance in >>> MB</description> >>> </property> >>> >>> <property> >>> <name>io.sort.factor</name> >>> <value>@TODO-DEMUX-IO-SORT-FACTOR@</value> >>> <description>The number of streams to merge at once while sorting >>> files. This determines the number of open file handles.</description> >>> </property> >>> >> >