[jira] [Resolved] (SPARK-11544) sqlContext doesn't use PathFilter

Yin Huai (JIRA) Wed, 18 Nov 2015 14:07:01 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yin Huai resolved SPARK-11544.
------------------------------
       Resolution: Fixed
    Fix Version/s: 1.6.0

Issue resolved by pull request 9652
[https://github.com/apache/spark/pull/9652]

> sqlContext doesn't use PathFilter
> ---------------------------------
>
>                 Key: SPARK-11544
>                 URL: https://issues.apache.org/jira/browse/SPARK-11544
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>         Environment: AWS EMR 4.1.0, Spark 1.5.0
>            Reporter: Frank Dai
>            Assignee: Dilip Biswal
>             Fix For: 1.6.0
>
>
> When sqlContext reads JSON files, it doesn't use {{PathFilter}} in the 
> underlying SparkContext
> {code:java}
> val sc = new SparkContext(conf)
> sc.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class", 
> classOf[TmpFileFilter], classOf[PathFilter])
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> {code}
> The definition of {{TmpFileFilter}} is:
> {code:title=TmpFileFilter.scala|borderStyle=solid}
> import org.apache.hadoop.fs.{Path, PathFilter}
> class TmpFileFilter  extends PathFilter {
>   override def accept(path : Path): Boolean = !path.getName.endsWith(".tmp")
> }
> {code}
> When use {{sqlContext}} to read JSON files, e.g., 
> {{sqlContext.read.schema(mySchema).json(s3Path)}}, Spark will throw out an 
> exception:
> {quote}
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> s3://chef-logstash-access-backup/2015/10/21/00/logstash-172.18.68.59-s3.1445388158944.gz.tmp
> {quote}
> It seems {{sqlContext}} can see {{.tmp}} files while {{sc}} can not, which 
> causes the above exception



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-11544) sqlContext doesn't use PathFilter

Reply via email to