[
https://issues.apache.org/jira/browse/PIG-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mohit Sabharwal updated PIG-4585:
---------------------------------
Description:
LoadConverter currently uses SparkContext.newAPIHadoopFile which won't work for
non-filesystem based input sources, like HBase.
newAPIHadoopFile assumes a FileInputFormat and attempts to
[verify|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1065]
this in the constructor, which fails for HBaseTableInputFormat (which is not a
FileInputFormat)
{code}
NewFileInputFormat.setInputPaths(job, path)
{code}
was:
LoadConverter currently uses SparkContext.newAPIHadoopFile which won't work for
non-filesystem based input sources, like HBase.
newAPIHadoopFile assumes a FileInputFormat and attempts to
[verify|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1065]
this in the constructor, which fails for HBaseTableInputFormat (which is not a
FileInputFormat)
> Use newAPIHadoopRDD instead of newAPIHadoopFile
> -----------------------------------------------
>
> Key: PIG-4585
> URL: https://issues.apache.org/jira/browse/PIG-4585
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Affects Versions: spark-branch
> Reporter: Mohit Sabharwal
> Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
>
> LoadConverter currently uses SparkContext.newAPIHadoopFile which won't work
> for non-filesystem based input sources, like HBase.
> newAPIHadoopFile assumes a FileInputFormat and attempts to
> [verify|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1065]
> this in the constructor, which fails for HBaseTableInputFormat (which is not
> a FileInputFormat)
> {code}
> NewFileInputFormat.setInputPaths(job, path)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)