[ https://issues.apache.org/jira/browse/SPARK-15826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shixiong Zhu resolved SPARK-15826. ---------------------------------- Resolution: Fixed Assignee: Tejas Patil Fix Version/s: 2.0.0 > PipedRDD to allow configurable char encoding > -------------------------------------------- > > Key: SPARK-15826 > URL: https://issues.apache.org/jira/browse/SPARK-15826 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: Tejas Patil > Assignee: Tejas Patil > Priority: Trivial > Fix For: 2.0.0 > > > Encountered an issue wherein the code works in some cluster but fails on > another one for the same input. After debugging realised that PipedRDD is > picking default char encoding from the JVM which may be different across > different platforms. Making it use UTF-8 encoding just like > `ScriptTransformation` does. > Stack trace: > {noformat} > Caused by: java.nio.charset.MalformedInputException: Input length = 1 > at java.nio.charset.CoderResult.throwException(CoderResult.java:281) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.readLine(BufferedReader.java:324) > at java.io.BufferedReader.readLine(BufferedReader.java:389) > at > scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67) > at org.apache.spark.rdd.PipedRDD$$anon$1.hasNext(PipedRDD.scala:185) > at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1612) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160) > at > org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868) > at > org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org