[ 
https://issues.apache.org/jira/browse/PIG-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543751#comment-13543751
 ] 

Christoph Bauer commented on PIG-3108:
--------------------------------------

Sorry. I've been with this code too long. I will try to explain.

addFiltersWithColumnPrefix and  addFiltersWithoutColumnPrefix actually do
different things:

   - addFiltersWithColumnPrefix creates HBase scan filters
   - addFiltersWithoutColumnPrefix tells the scan object the families and
   columns to retrieve. This is much quicker than adding filters, thats why it
   was changed.

The thing is: the scan object should always be limited to the
family/columns needed to speed things up. In fact we did this already - see
setLocation (it's basicly the same as in addFiltersWithoutColumnPrefix).

So what I did was to replace the code in setLocation with a call to
addFiltersWithoutColumnPrefix


To make things clear, we could remove addFiltersWithoutColumnPrefix  from
the if/else in initScan() ()setLocation will called anyway) and rename it
to setScanColumns or something.



2013/1/3 Bill Graham (JIRA) <[email protected]>


                
> HBaseStorage returns empty maps when mixing wildcard- with other columns
> ------------------------------------------------------------------------
>
>                 Key: PIG-3108
>                 URL: https://issues.apache.org/jira/browse/PIG-3108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1, 0.9.2, 0.10.0, 0.11, 0.10.1, 0.12
>            Reporter: Christoph Bauer
>             Fix For: 0.12
>
>         Attachments: PIG-3108.patch
>
>
> Consider the following:
> A and B should be the same (with different order, of course).
> {code}
> /*
> in hbase shell:
> create 'pigtest', 'pig'
> put 'pigtest' , '1', 'pig:name', 'A'
> put 'pigtest' , '1', 'pig:has_legs', 'true'
> put 'pigtest' , '1', 'pig:has_ribs', 'true'
> */
> A = LOAD 'hbase://pigtest' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('pig:name pig:has*') AS 
> (name:chararray,parts);
> B = LOAD 'hbase://pigtest' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('pig:has* pig:name') AS 
> (parts,name:chararray);
> dump A;
> dump B;
> {code}
> This is due to a bug in setLocation and initScan.
> For _A_ 
> # scan.addColumn(pig,name); // for 'pig:name'
> # scan.addFamily(pig); // for the 'pig:has*'
> So that's silently right.
> But for _B_
> # scan.addFamily(pig)
> # scan.addColumn(pig,name)
> will override the first call to addFamily, because you cannot mix them on the 
> same family.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to