[
https://issues.apache.org/jira/browse/PIG-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082955#comment-13082955
]
Dmitriy V. Ryaboy commented on PIG-2114:
----------------------------------------
Apologies again for taking a while to review.
Thanks, that looks like a fair bit of work.
First, just a couple of procedural notes:
1) make sure the new files don't have @author annotations and do have the
apache headers
2) there is already a TestHBaseStorage. Why add a new one in util?
3) This is sort of a PITA request, especially as there are plenty of places in
the codebase that don't adhere to this practice, but can you make sure to do
things like put spaces after commas (as in
Map<family,Map<qualifier,Map<timestamp,value>>>) and before opening parens (as
in for(Map.Entry valueEntry: ...), wrap lines to a reasonable length, etc?
My major concern with the patch is as follows.
In getNext() you inserted a completely new flow that is used if timestamps are
used. It bypasses all the existing logic for how results are created, and as
far as I can see, does not respect things like projection pushdown. It also
makes it so any future work on the hbase loader logic has to happen in two
places. Let's not do that. Isn't loading a single-version row just a special
case of loading multiple versions (with n = 1)? We should be able to do this in
one go.
There being so much stuff mixed in here, I propose we get the smaller stuff
like PIG-2115 in. Some of the things you are doing here are also pretty
non-controversial, like omitNulls and prefix filters, we can get those in
pretty easily. Let's factor out the multiple versions changes and add them to
PIG-1832, leaving this (blessedly unspecifically titled :)) ticket to deal with
the smaller stuff.
> Enhancements to PIG HBaseStorage Load & Store Func with extra scan
> configurations
> ---------------------------------------------------------------------------------
>
> Key: PIG-2114
> URL: https://issues.apache.org/jira/browse/PIG-2114
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Affects Versions: 0.9.0
> Reporter: Hariprasad Kuppuswamy
> Assignee: Hariprasad Kuppuswamy
> Priority: Minor
> Labels: hbase, storage
> Fix For: 0.10
>
> Attachments:
> Enhancments-to-enable-timestampversion-based-row-scan.patch
>
>
> - Added capability to specify scan based on timestamps (Hariprasad
> Kuppuswwamy)
> - Ability to specify number of versions to be fetched with current scan
> (Hariprasad Kuppuswwamy)
> - Configure the rowkey prefixes filter for the scan (Hariprasad Kuppuswwamy)
> - Added ability to omit nulls when dealing with hbase storage (Greg Bowyer)
> - Added ability to specify Put timestamps while insertion (Hariprasad
> Kuppuswamy)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira