[
https://issues.apache.org/jira/browse/PIG-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988022#action_12988022
]
Ashutosh Chauhan commented on PIG-1828:
---------------------------------------
I don't think we have sufficient evidence yet to point finger at split
combination for this bug. Theoretically, combination of multiple TableSplits
into one Split within Pig should not result in any problem, if you honor the
semantics of InputFormat imposed by MR framework, which is each split is
stateless in a sense it doesn't maintain any state. One TableSplit should know
nothing about another one. I don't know enough about TableSplit, but I would
assume they are indeed stateless.
OrderedLoadFunc tries to impose this restriction by defining an order on
Splits. It dictates that all keys in one split are smaller then another one.
Thus, ideally Pig should *not* combine the loaders implementing it. But for
reasons discussed in PIG-1518 it was eventually decided that for feature to be
useful, Pig wouldn't combine OrderedLoadFunc loaders *only* if loader is also
used for MergeJoin or map-side cogroups in scripts. So, adding OLF won't turn
off the combination in all cases. If you suspect combination is causing a bug
(potentially because TableSplits are stateful w.r.t each other) then only
setting the flag to false will ensure no-combination. But, I doubt that
TableSplits have state and the split combination is causing the bug. Ian, Lukas
can you confirm if setting pig.splitCombination to false results in bug going
away?
> HBaseStorage has problems with processing multiregion tables
> ------------------------------------------------------------
>
> Key: PIG-1828
> URL: https://issues.apache.org/jira/browse/PIG-1828
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Environment: Hadoop 0.20.2, Hbase 0.20.6, Distributed mode
> Reporter: Lukas
>
> As brought up in the pig user mailing list
> (http://www.mail-archive.com/user%40pig.apache.org/msg00606.html) Pig does
> sometime not scan the full HBase table.
> It seems that HBaseStorage has problems scanning large tables. It issues just
> one mapper job instead of one mapper job per table region.
> Ian Stevens, who brought this issue up in the mailing list, attached a script
> to reproduce the problem (https://gist.github.com/766929).
> However, in my case, the problem only occurred, after the table was split
> into more than one regions.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.