[
https://issues.apache.org/jira/browse/BLUR-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666437#comment-13666437
]
Aaron McCurry commented on BLUR-95:
-----------------------------------
Overall I think your patch is pretty good, the logic is sound however there a
few optimizations that I would like to see.
1. I would like to see the logic of the comparing the shards in the
applyDeletes method changed a little. I'm concerned about how many String
objects are going to be created. A more optimized way of doing the comparsion
is to take the shard String passed into the applyDeletes method and call int
currentShardId = BlurUtil.getShardIndex(shard) to get the integer of this
shard. Then when you call "int partition = blurPartitioner.getPartition(key,
null, numberOfShards);" you can just check that the partition == currentShardId.
2. Also you should reuse Hadoop writable objects like BytesWritable by using
setBytes() instead of just creating a new object on every iteration
"BytesWritable key = new BytesWritable(rowId.getBytes());". If you have to
just inline the method into the loop to make it easier to reuse the objects
that is fine. I am more concerned about performance than small methods.
3. The code "_shardContext.getTableContext().getDescriptor().getShardCount()"
should be called once before the loop instead of every iteration through the
loop.
4. The ref.utf8ToString() is an expensive call because it creates a String.
You should be able to set the bytes into the BytesWritable object without first
converting it to a String. This will be much faster, because utf8ToString
turns the byte[] in the ByteRef into a String and the rowId.getBytes() just
turns it back into a byte[].
Thanks! Let me know if you need any help with these.
Aaron
> IndexImporter class - add a double check on the rowid to validate the index.
> ----------------------------------------------------------------------------
>
> Key: BLUR-95
> URL: https://issues.apache.org/jira/browse/BLUR-95
> Project: Apache Blur
> Issue Type: Improvement
> Affects Versions: 0.1.5
> Reporter: Aaron McCurry
> Fix For: 0.1.5
>
> Attachments: 0001-BLUR-ID-95-double-check-on-the-rowid.patch
>
>
> In the IndexImporter add a double check to the importer that validates the
> rowids in the import are valid ids for the given shard. This can be done
> when the rowids in the new index are iterated over during the delete phase.
> A BlurPartitioner class can valid the rowid should be in the given shard.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira