On Fri, May 3, 2013 at 11:05 AM, Aaron McCurry <[email protected]> wrote: > Ok, so the better approach is to create a second new index and index the > entire row into that new small index. Then once the row is complete, close > that new writer and index and merge it into the main index. This allows us > to index everything and not run the reducer out of memory.
So move to the temporary index approach as the way to do all the M/R builds vs just an exception for large rows? --tim > On Fri, May 3, 2013 at 10:59 AM, Tim Williams <[email protected]> wrote: > >> Thanks, this helps. I'm looking into patching the BlurReducer so that >> when a Row hits maxRecordsPerRow, it indexes what it can of a row - as >> opposed to dropping it completely. What's a better approach? :) >> >> --tim >> >> On Fri, May 3, 2013 at 10:44 AM, Aaron McCurry <[email protected]> wrote: >> > BlurTask._maxRecordCount >> > >> > This is used for testing, so that you can exit a mapper after N number of >> > records. >> > >> > BlurTask._maxRecordsPerRow >> > >> > This will increase the number of records in a single row. Be careful >> with >> > this option because this may run the reducer out of memory, I have a >> patch >> > that I can apply that removes this limit but for now it's still a risky >> to >> > increase this too large/ >> > >> > BlurTask._ramBufferSizeMB >> > >> > This is the Lucene writer buffer, large values normally increase indexing >> > throughput. >> > >> > Aaron >> > >> > >> > On Fri, May 3, 2013 at 10:30 AM, Tim Williams <[email protected]> >> wrote: >> > >> >> I have an instance where I need to increase max records per row, but >> >> before I do I want to understand the relationship (if there is one) >> >> between: >> >> >> >> BlurTask._maxRecordCount >> >> BlurTask._maxRecordsPerRow >> >> BlurTask._ramBufferSizeMB >> >> >> >> I understand maxRecordsPerRow, but in looking into this found I don't >> >> understand the _maxRecordCount and/or what interplay might exist with >> >> buffer size. >> >> >> >> Thanks, >> >> --tim >> >> >>
