[ https://issues.apache.org/jira/browse/SPARK-14273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885086#comment-15885086 ]
Hyukjin Kwon commented on SPARK-14273: -------------------------------------- Hi [~yhuai] and [~liancheng], it seems there is {{FileFormat.IsSplitable}} which was resolve in SPARK-15654 and https://github.com/apache/spark/pull/13531. Could we resolve this JIRA if I understood correctly? > Add FileFormat.isSplittable to indicate whether a format is splittable > ---------------------------------------------------------------------- > > Key: SPARK-14273 > URL: https://issues.apache.org/jira/browse/SPARK-14273 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.0.0 > Reporter: Cheng Lian > > {{FileSourceStrategy}} assumes that all data source formats are splittable > and always splits data files by fixed partition size. However, not all HDSF > based formats are splittable. We need a flag to indicate that and ensure that > non-splittable files won't be split into multiple Spark partitions. > (PS: Is it "splitable" or "splittable"? Probably the latter one? Hadoop uses > the former one though...) -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org