Yea, please report the bug on a supported Spark version like 2.4. On Thu, Apr 23, 2020 at 3:40 PM Dhrubajyoti Hati <dhruba.w...@gmail.com> wrote:
> FYI we are using Spark 2.2.0. Should the change be present in this spark > version? Wanted to check before opening a JIRA ticket? > > > > > *Regards,Dhrubajyoti Hati.* > > > On Thu, Apr 23, 2020 at 10:12 AM Wenchen Fan <cloud0...@gmail.com> wrote: > >> This looks like a bug that path filter doesn't work for hive table >> reading. Can you open a JIRA ticket? >> >> On Thu, Apr 23, 2020 at 3:15 AM Dhrubajyoti Hati <dhruba.w...@gmail.com> >> wrote: >> >>> Just wondering if any one could help me out on this. >>> >>> Thank you! >>> >>> >>> >>> >>> *Regards,Dhrubajyoti Hati.* >>> >>> >>> On Wed, Apr 22, 2020 at 7:15 PM Dhrubajyoti Hati <dhruba.w...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> Is there any way to discard files starting with dot(.) or ending with >>>> .tmp in the hive partition while reading from Hive table using >>>> spark.read.table method. >>>> >>>> I tried using PathFilters but they didn't work. I am using spark-submit >>>> and passing my python file(pyspark) containing the source code. >>>> >>>> spark.sparkContext._jsc.hadoopConfiguration().set("mapreduce.input.pathFilter.class", >>>> "com.abc.hadoop.utility.TmpFileFilter") >>>> >>>> class TmpFileFilter extends PathFilter { >>>> override def accept(path : Path): Boolean = >>>> !path.getName.endsWith(".tmp") >>>> } >>>> >>>> Still in the detailed logs I can see .tmp files are getting considered >>>> in the detailed logs: >>>> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus >>>> maprfs:///a/hour=05/host=abc/FlumeData.1587559137715 >>>> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus >>>> maprfs:///a/hour=05/host=abc/FlumeData.1587556815621 >>>> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus >>>> maprfs:///a/hour=05/host=abc/.FlumeData.1587560277337.tmp >>>> >>>> >>>> Is there any way to discard the tmp(.tmp) or the hidden files(filename >>>> starting with dot or underscore) in hive partitions while reading from >>>> spark? >>>> >>>> >>>> >>>> >>>> *Regards,Dhrubajyoti Hati.* >>>> >>>