[jira] [Commented] (BLUR-95) IndexImporter class - add a double check on the rowid to validate the index.

Aaron McCurry (JIRA) Fri, 24 May 2013 09:34:26 -0700

    [ 
https://issues.apache.org/jira/browse/BLUR-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666437#comment-13666437
 ]


Aaron McCurry commented on BLUR-95:
-----------------------------------

Overall I think your patch is pretty good, the logic is sound however there a 
few optimizations that I would like to see.

1. I would like to see the logic of the comparing the shards in the 
applyDeletes method changed a little.  I'm concerned about how many String 
objects are going to be created.  A more optimized way of doing the comparsion 
is to take the shard String passed into the applyDeletes method and call int 
currentShardId = BlurUtil.getShardIndex(shard) to get the integer of this 
shard.  Then when you call "int partition = blurPartitioner.getPartition(key, 
null, numberOfShards);" you can just check that the partition == currentShardId.

2. Also you should reuse Hadoop writable objects like BytesWritable by using 
setBytes() instead of just creating a new object on every iteration 
"BytesWritable key = new BytesWritable(rowId.getBytes());".  If you have to 
just inline the method into the loop to make it easier to reuse the objects 
that is fine.  I am more concerned about performance than small methods.

3. The code "_shardContext.getTableContext().getDescriptor().getShardCount()" 
should be called once before the loop instead of every iteration through the 
loop.

4. The ref.utf8ToString() is an expensive call because it creates a String.  
You should be able to set the bytes into the BytesWritable object without first 
converting it to a String.  This will be much faster, because utf8ToString 
turns the byte[] in the ByteRef into a String and the rowId.getBytes() just 
turns it back into a byte[].

Thanks!  Let me know if you need any help with these.

Aaron


                
> IndexImporter class - add a double check on the rowid to validate the index.
> ----------------------------------------------------------------------------
>
>                 Key: BLUR-95
>                 URL: https://issues.apache.org/jira/browse/BLUR-95
>             Project: Apache Blur
>          Issue Type: Improvement
>    Affects Versions: 0.1.5
>            Reporter: Aaron McCurry
>             Fix For: 0.1.5
>
>         Attachments: 0001-BLUR-ID-95-double-check-on-the-rowid.patch
>
>
> In the IndexImporter add a double check to the importer that validates the 
> rowids in the import are valid ids for the given shard.  This can be done 
> when the rowids in the new index are iterated over during the delete phase.  
> A BlurPartitioner class can valid the rowid should be in the given shard.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BLUR-95) IndexImporter class - add a double check on the rowid to validate the index.

Reply via email to