Ok, so the better approach is to create a second new index and index the entire row into that new small index. Then once the row is complete, close that new writer and index and merge it into the main index. This allows us to index everything and not run the reducer out of memory.
On Fri, May 3, 2013 at 10:59 AM, Tim Williams <[email protected]> wrote: > Thanks, this helps. I'm looking into patching the BlurReducer so that > when a Row hits maxRecordsPerRow, it indexes what it can of a row - as > opposed to dropping it completely. What's a better approach? :) > > --tim > > On Fri, May 3, 2013 at 10:44 AM, Aaron McCurry <[email protected]> wrote: > > BlurTask._maxRecordCount > > > > This is used for testing, so that you can exit a mapper after N number of > > records. > > > > BlurTask._maxRecordsPerRow > > > > This will increase the number of records in a single row. Be careful > with > > this option because this may run the reducer out of memory, I have a > patch > > that I can apply that removes this limit but for now it's still a risky > to > > increase this too large/ > > > > BlurTask._ramBufferSizeMB > > > > This is the Lucene writer buffer, large values normally increase indexing > > throughput. > > > > Aaron > > > > > > On Fri, May 3, 2013 at 10:30 AM, Tim Williams <[email protected]> > wrote: > > > >> I have an instance where I need to increase max records per row, but > >> before I do I want to understand the relationship (if there is one) > >> between: > >> > >> BlurTask._maxRecordCount > >> BlurTask._maxRecordsPerRow > >> BlurTask._ramBufferSizeMB > >> > >> I understand maxRecordsPerRow, but in looking into this found I don't > >> understand the _maxRecordCount and/or what interplay might exist with > >> buffer size. > >> > >> Thanks, > >> --tim > >> >
