[ 
https://issues.apache.org/jira/browse/PIG-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988160#action_12988160
 ] 

Dmitriy V. Ryaboy commented on PIG-1828:
----------------------------------------

Ashutosh,
HBase stores records ordered by their keys, and splits the keyspace into 
regions as needed (unlike something like Cassandra, which by default uses hash 
partitioning and can be *made* to use total order partitions, total order is 
the *only* thing HBase does).

Indeed, implementing OLF didn't solve my problem as the splits were still 
combined. I don't know if TableSplits are stateful.

> HBaseStorage has problems with processing multiregion tables
> ------------------------------------------------------------
>
>                 Key: PIG-1828
>                 URL: https://issues.apache.org/jira/browse/PIG-1828
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>         Environment: Hadoop 0.20.2, Hbase 0.20.6, Distributed mode
>            Reporter: Lukas
>
> As brought up in the pig user mailing list 
> (http://www.mail-archive.com/user%40pig.apache.org/msg00606.html) Pig does 
> sometime not scan the full HBase table.
> It seems that HBaseStorage has problems scanning large tables. It issues just 
> one mapper job instead of one mapper job per table region.
> Ian Stevens, who brought this issue up in the mailing list, attached a script 
> to reproduce the problem (https://gist.github.com/766929).
> However, in my case, the problem only occurred, after the table was split 
> into more than one regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to