[
https://issues.apache.org/jira/browse/PIG-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073069#comment-13073069
]
Raghu Angadi commented on PIG-2193:
-----------------------------------
Our tests verify that we return the correct columns to users with projections.
How can we verify that HBase scanner actually fetches only the projected
columns? It seems hard to get hold of the Scan object.
> Problem with HBase loader 0.90.3 and PIG 0.8.1
> ----------------------------------------------
>
> Key: PIG-2193
> URL: https://issues.apache.org/jira/browse/PIG-2193
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1
> Environment: HBase 0.90.3, Hadoop 0.20-append
> Reporter: Vincent BARAT
> Attachments: PIG-2193.patch
>
>
> I've some data in HBase 0.90.3 and I run a simple script on them.
> This script badly returns 0 records. From time to time, under yet undefined
> conditions, the same script on the same data works (it return correct data).
> When data are loaded from HDFS instead of HBase, the script runs perfectly.
> Here is the script loading from HDFS (works):
> start_sessions = LOAD 'start_sessions' AS (sid:chararray, infoid:chararray,
> imei:chararray, start:long);
> end_sessions = LOAD 'end_sessions' AS (sid:chararray, end:long,
> locid:chararray);
> infos = LOAD 'infos' AS (infoid:chararray, network_type:chararray,
> network_subtype:chararray, locale:chararray, version_name:chararray,
> carrier_country:chararray, carrier_name:chararray,
> phone_manufacturer:chararray, phone_model:chararray,
> firmware_version:chararray, firmware_name:chararray);
> sessions = JOIN start_sessions BY sid, end_sessions BY sid;
> sessions = FILTER sessions BY end > start AND end - start < 86400000L;
> sessions = JOIN sessions BY infoid, infos BY infoid;
> sessions = LIMIT sessions 100;
> dump sessions;
> The same script loading from HBase (don't work):
> start_sessions = LOAD 'startSession' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:infoid
> meta:imei meta:timestamp') AS (sid:chararray, infoid:chararray,
> imei:chararray, start:long);
> end_sessions = LOAD 'endSession' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:timestamp
> meta:locid') AS (sid:chararray, end:long, locid:chararray);
> infos = LOAD 'info' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid
> data:networkType data:networkSubtype data:locale data:applicationVersionName
> data:carrierCountry data:carrierName data:phoneManufacturer data:phoneModel
> data:firmwareVersion data:firmwareName') AS (infoid:chararray,
> network_type:chararray, network_subtype:chararray, locale:chararray,
> version_name:chararray, carrier_country:chararray, carrier_name:chararray,
> phone_manufacturer:chararray, phone_model:chararray,
> firmware_version:chararray, firmware_name:chararray);
> sessions = JOIN start_sessions BY sid, end_sessions BY sid;
> sessions = FILTER sessions BY end > start AND end - start < 86400000L;
> sessions = JOIN sessions BY infoid, infos BY infoid;
> sessions = LIMIT sessions 100;
> dump sessions;
> I guess it definitively means there is a nasty bug in the HBase loader.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira