[
https://issues.apache.org/jira/browse/PIG-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988159#action_12988159
]
Ashutosh Chauhan commented on PIG-1828:
---------------------------------------
Thanks Lukas for checking. This indicates that TableSplits are rather not
combinable. Thinking more about it, I think basic Pig's assumption that splits
can be combined in general and only for special cases we won't combine (which
Pig checks itself) is not correct. Question of combination should really be
asked from Loader and not assumed. Also, this OLF thing is too complicated.
Condition imposed by OLF is one possibility, but I assume there exists other
scenarios where loader is not OLF but is still not combinable. I would propose
to add a new method in LoadFunc and ask directly from loader and drop all the
logic of determining whether splits are combinable or not.
{java}
// By default, splits generated by a loader is considered combinable to
preserve current behavior
public boolean isCombinable() {
return true;
}
{java}
Good thing is LoadFunc is abstract class, so this won't break backward
compatibility.
@Dmitiry,
As I pointed above adding OLF to HBaseStorage will not help. Though it won't
hurt either. A quick fix for HBaseStorage loader for now is to set the key to
false, somewhere early. I think setLocation() or setSchema() is one of the
first methods called on LoadFunc and since checks for determining combination
happen much later, loader setting that key to false will be seen and
combination won't happen. That will avoid the need of telling the users of
HbaseStorage to set the key themselves.
> HBaseStorage has problems with processing multiregion tables
> ------------------------------------------------------------
>
> Key: PIG-1828
> URL: https://issues.apache.org/jira/browse/PIG-1828
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Environment: Hadoop 0.20.2, Hbase 0.20.6, Distributed mode
> Reporter: Lukas
>
> As brought up in the pig user mailing list
> (http://www.mail-archive.com/user%40pig.apache.org/msg00606.html) Pig does
> sometime not scan the full HBase table.
> It seems that HBaseStorage has problems scanning large tables. It issues just
> one mapper job instead of one mapper job per table region.
> Ian Stevens, who brought this issue up in the mailing list, attached a script
> to reproduce the problem (https://gist.github.com/766929).
> However, in my case, the problem only occurred, after the table was split
> into more than one regions.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.