[ 
https://issues.apache.org/jira/browse/PIG-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082955#comment-13082955
 ] 

Dmitriy V. Ryaboy commented on PIG-2114:
----------------------------------------

Apologies again for taking a while to review.

Thanks, that looks like a fair bit of work.

First, just a couple of procedural notes:
1) make sure the new files don't have @author annotations and do have the 
apache headers
2) there is already a TestHBaseStorage. Why add a new one in util?
3) This is sort of a PITA request, especially as there are plenty of places in 
the codebase that don't adhere to this practice, but can you make sure to do 
things like put spaces after commas (as in 
Map<family,Map<qualifier,Map<timestamp,value>>>) and before opening parens (as 
in for(Map.Entry valueEntry: ...),  wrap lines to a reasonable length, etc?

My major concern with the patch is as follows.

In getNext() you inserted a completely new flow that is used if timestamps are 
used. It bypasses all the existing logic for how results are created, and as 
far as I can see, does not respect things like projection pushdown. It also 
makes it so any future work on the hbase loader logic has to happen in two 
places. Let's not do that. Isn't loading a single-version row just a special 
case of loading multiple versions (with n = 1)? We should be able to do this in 
one go.

There being so much stuff mixed in here, I propose we get the smaller stuff 
like PIG-2115 in. Some of the things you are doing here are also pretty 
non-controversial, like omitNulls and prefix filters, we can get those in 
pretty easily. Let's factor out the multiple versions changes and add them to 
PIG-1832, leaving this (blessedly unspecifically titled :)) ticket to deal with 
the smaller stuff.

> Enhancements to PIG HBaseStorage Load & Store Func with extra scan 
> configurations
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-2114
>                 URL: https://issues.apache.org/jira/browse/PIG-2114
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>    Affects Versions: 0.9.0
>            Reporter: Hariprasad Kuppuswamy
>            Assignee: Hariprasad Kuppuswamy
>            Priority: Minor
>              Labels: hbase, storage
>             Fix For: 0.10
>
>         Attachments: 
> Enhancments-to-enable-timestampversion-based-row-scan.patch
>
>
> - Added capability to specify scan based on timestamps (Hariprasad 
> Kuppuswwamy)
> - Ability to specify number of versions to be fetched with current scan 
> (Hariprasad Kuppuswwamy)
> - Configure the rowkey prefixes filter for the scan (Hariprasad Kuppuswwamy)
> - Added ability to omit nulls when dealing with hbase storage (Greg Bowyer)
> - Added ability to specify Put timestamps while insertion (Hariprasad 
> Kuppuswamy)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to