Thanks, this helps. I'm looking into patching the BlurReducer so that when a Row hits maxRecordsPerRow, it indexes what it can of a row - as opposed to dropping it completely. What's a better approach? :)
--tim On Fri, May 3, 2013 at 10:44 AM, Aaron McCurry <[email protected]> wrote: > BlurTask._maxRecordCount > > This is used for testing, so that you can exit a mapper after N number of > records. > > BlurTask._maxRecordsPerRow > > This will increase the number of records in a single row. Be careful with > this option because this may run the reducer out of memory, I have a patch > that I can apply that removes this limit but for now it's still a risky to > increase this too large/ > > BlurTask._ramBufferSizeMB > > This is the Lucene writer buffer, large values normally increase indexing > throughput. > > Aaron > > > On Fri, May 3, 2013 at 10:30 AM, Tim Williams <[email protected]> wrote: > >> I have an instance where I need to increase max records per row, but >> before I do I want to understand the relationship (if there is one) >> between: >> >> BlurTask._maxRecordCount >> BlurTask._maxRecordsPerRow >> BlurTask._ramBufferSizeMB >> >> I understand maxRecordsPerRow, but in looking into this found I don't >> understand the _maxRecordCount and/or what interplay might exist with >> buffer size. >> >> Thanks, >> --tim >>
