[
https://issues.apache.org/jira/browse/PIG-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987979#action_12987979
]
Dmitriy V. Ryaboy commented on PIG-1828:
----------------------------------------
Found the issue!
Turns out HBaseStorage is doing the right thing and returning the correct set
of splits; but PIG-1518 is merging the splits back into a single split! No
wonder I wasn't seeing it, i was running with combinations turned off.
Short term fix: set pig.splitCombination to false.
Long term fix: I added OrderedLoadFunc implementation to the loader, so that
PIG-1518 doesn't apply. I think this is correct, since TableSplits are in fact
comparable, but I am not sure what exact consequences implementing this
interface will have with regards to merge joins and such. Ashutosh, can you
comment?
For the folks using the EB version -- you are not affected, since this is only
a 0.8 problem.
> HBaseStorage has problems with processing multiregion tables
> ------------------------------------------------------------
>
> Key: PIG-1828
> URL: https://issues.apache.org/jira/browse/PIG-1828
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Environment: Hadoop 0.20.2, Hbase 0.20.6, Distributed mode
> Reporter: Lukas
>
> As brought up in the pig user mailing list
> (http://www.mail-archive.com/user%40pig.apache.org/msg00606.html) Pig does
> sometime not scan the full HBase table.
> It seems that HBaseStorage has problems scanning large tables. It issues just
> one mapper job instead of one mapper job per table region.
> Ian Stevens, who brought this issue up in the mailing list, attached a script
> to reproduce the problem (https://gist.github.com/766929).
> However, in my case, the problem only occurred, after the table was split
> into more than one regions.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.