RE: sc.textFile() on windows cannot access UNC path
Thanks for the reference. Is the following procedure correct? 1.Copy of the Hadoop source code org.apache.hadoop.mapreduce.lib.input .TextInputFormat.java as my own class, e.g. UncTextInputFormat.java 2.Modify UncTextInputFormat.java to handle UNC path 3.Call sc.newAPIHadoopFile(…) with sc.newAPIHadoopFile[LongWritable, Text, UncTextInputFormat](“file:10.196.119.230/folder1/abc.txt”, classOf[UncTextInputFormat], classOf[LongWritable], classOf[Text], conf) Ningjun From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: Wednesday, March 11, 2015 2:40 AM To: Wang, Ningjun (LNG-NPV) Cc: java8964; user@spark.apache.org Subject: Re: sc.textFile() on windows cannot access UNC path I don't have a complete example for your usecase, but you can see a lot of codes showing how to use new APIHadoopFile from herehttps://github.com/search?q=sc.newAPIHadoopFiletype=Codeutf8=%E2%9C%93 Thanks Best Regards On Tue, Mar 10, 2015 at 7:37 PM, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.commailto:ningjun.w...@lexisnexis.com wrote: This sounds like the right approach. Is there any sample code showing how to use sc.newAPIHadoopFile ? I am new to Spark and don’t know much about Hadoop. I just want to read a text file from UNC path into an RDD. Thanks From: Akhil Das [mailto:ak...@sigmoidanalytics.commailto:ak...@sigmoidanalytics.com] Sent: Tuesday, March 10, 2015 9:14 AM To: java8964 Cc: Wang, Ningjun (LNG-NPV); user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: sc.textFile() on windows cannot access UNC path You can create your own Input Reader (using java.nio.*) and pass it to the sc.newAPIHadoopFile while reading. Thanks Best Regards On Tue, Mar 10, 2015 at 6:28 PM, java8964 java8...@hotmail.commailto:java8...@hotmail.com wrote: I think the work around is clear. Using JDK 7, and implement your own saveAsRemoteWinText() using java.nio.path. Yong From: ningjun.w...@lexisnexis.commailto:ningjun.w...@lexisnexis.com To: java8...@hotmail.commailto:java8...@hotmail.com; user@spark.apache.orgmailto:user@spark.apache.org Subject: RE: sc.textFile() on windows cannot access UNC path Date: Tue, 10 Mar 2015 03:02:37 + Hi Yong Thanks for the reply. Yes it works with local drive letter. But I really need to use UNC path because the path is input from at runtime. I cannot dynamically assign a drive letter to arbitrary UNC path at runtime. Is there any work around that I can use UNC path for sc.textFile(…)? Ningjun From: java8964 [mailto:java8...@hotmail.commailto:java8...@hotmail.com] Sent: Monday, March 09, 2015 5:33 PM To: Wang, Ningjun (LNG-NPV); user@spark.apache.orgmailto:user@spark.apache.org Subject: RE: sc.textFile() on windows cannot access UNC path This is a Java problem, not really Spark. From this page: http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u You can see that using Java.nio.* on JDK 7, it will fix this issue. But Path class in Hadoop will use java.io.*, instead of java.nio. You need to manually mount your windows remote share a local driver, like Z:, then it should work. Yong From: ningjun.w...@lexisnexis.commailto:ningjun.w...@lexisnexis.com To: user@spark.apache.orgmailto:user@spark.apache.org Subject: sc.textFile() on windows cannot access UNC path Date: Mon, 9 Mar 2015 21:09:38 + I am running Spark on windows 2008 R2. I use sc.textFile() to load text file using UNC path, it does not work. sc.textFile(rawfile:10.196.119.230/folder1/abc.txtfile:///\\10.196.119.230\folder1\abc.txt, 4).count() Input path does not exist: file:/10.196.119.230/folder1/abc.txthttp://10.196.119.230/folder1/abc.txt org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/10.196.119.230/tar/Enron/enron-207-short.loadhttp://10.196.119.230/tar/Enron/enron-207-short.load at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203
Re: sc.textFile() on windows cannot access UNC path
Sounds like the way of doing it. Could you try accessing a file from UNC Path with native Java nio code and make sure it is able access it with the URI file:10.196.119.230/folder1/abc.txt? Thanks Best Regards On Wed, Mar 11, 2015 at 7:45 PM, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.com wrote: Thanks for the reference. Is the following procedure correct? 1.Copy of the Hadoop source code org.apache.hadoop.mapreduce.lib.input .TextInputFormat.java as my own class, e.g. UncTextInputFormat.java 2.Modify UncTextInputFormat.java to handle UNC path 3.Call sc.newAPIHadoopFile(…) with sc.newAPIHadoopFile[LongWritable, Text, UncTextInputFormat](“file: 10.196.119.230/folder1/abc.txt”, classOf[UncTextInputFormat], classOf[LongWritable], classOf[Text], conf) Ningjun *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com] *Sent:* Wednesday, March 11, 2015 2:40 AM *To:* Wang, Ningjun (LNG-NPV) *Cc:* java8964; user@spark.apache.org *Subject:* Re: sc.textFile() on windows cannot access UNC path I don't have a complete example for your usecase, but you can see a lot of codes showing how to use new APIHadoopFile from here https://github.com/search?q=sc.newAPIHadoopFiletype=Codeutf8=%E2%9C%93 Thanks Best Regards On Tue, Mar 10, 2015 at 7:37 PM, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.com wrote: This sounds like the right approach. Is there any sample code showing how to use sc.newAPIHadoopFile ? I am new to Spark and don’t know much about Hadoop. I just want to read a text file from UNC path into an RDD. Thanks *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com] *Sent:* Tuesday, March 10, 2015 9:14 AM *To:* java8964 *Cc:* Wang, Ningjun (LNG-NPV); user@spark.apache.org *Subject:* Re: sc.textFile() on windows cannot access UNC path You can create your own Input Reader (using java.nio.*) and pass it to the sc.newAPIHadoopFile while reading. Thanks Best Regards On Tue, Mar 10, 2015 at 6:28 PM, java8964 java8...@hotmail.com wrote: I think the work around is clear. Using JDK 7, and implement your own saveAsRemoteWinText() using java.nio.path. Yong -- From: ningjun.w...@lexisnexis.com To: java8...@hotmail.com; user@spark.apache.org Subject: RE: sc.textFile() on windows cannot access UNC path Date: Tue, 10 Mar 2015 03:02:37 + Hi Yong Thanks for the reply. Yes it works with local drive letter. But I really need to use UNC path because the path is input from at runtime. I cannot dynamically assign a drive letter to arbitrary UNC path at runtime. Is there any work around that I can use UNC path for sc.textFile(…)? Ningjun *From:* java8964 [mailto:java8...@hotmail.com] *Sent:* Monday, March 09, 2015 5:33 PM *To:* Wang, Ningjun (LNG-NPV); user@spark.apache.org *Subject:* RE: sc.textFile() on windows cannot access UNC path This is a Java problem, not really Spark. From this page: http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u You can see that using Java.nio.* on JDK 7, it will fix this issue. But Path class in Hadoop will use java.io.*, instead of java.nio. You need to manually mount your windows remote share a local driver, like Z:, then it should work. Yong -- From: ningjun.w...@lexisnexis.com To: user@spark.apache.org Subject: sc.textFile() on windows cannot access UNC path Date: Mon, 9 Mar 2015 21:09:38 + I am running Spark on windows 2008 R2. I use sc.textFile() to load text file using UNC path, it does not work. *sc*.textFile(*rawfile:10.196.119.230/folder1/abc.txt*, 4).count() Input path does not exist: file:/10.196.119.230/folder1/abc.txt org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/10.196.119.230/tar/Enron/enron-207-short.load at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203
Re: sc.textFile() on windows cannot access UNC path
I don't have a complete example for your usecase, but you can see a lot of codes showing how to use new APIHadoopFile from here https://github.com/search?q=sc.newAPIHadoopFiletype=Codeutf8=%E2%9C%93 Thanks Best Regards On Tue, Mar 10, 2015 at 7:37 PM, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.com wrote: This sounds like the right approach. Is there any sample code showing how to use sc.newAPIHadoopFile ? I am new to Spark and don’t know much about Hadoop. I just want to read a text file from UNC path into an RDD. Thanks *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com] *Sent:* Tuesday, March 10, 2015 9:14 AM *To:* java8964 *Cc:* Wang, Ningjun (LNG-NPV); user@spark.apache.org *Subject:* Re: sc.textFile() on windows cannot access UNC path You can create your own Input Reader (using java.nio.*) and pass it to the sc.newAPIHadoopFile while reading. Thanks Best Regards On Tue, Mar 10, 2015 at 6:28 PM, java8964 java8...@hotmail.com wrote: I think the work around is clear. Using JDK 7, and implement your own saveAsRemoteWinText() using java.nio.path. Yong -- From: ningjun.w...@lexisnexis.com To: java8...@hotmail.com; user@spark.apache.org Subject: RE: sc.textFile() on windows cannot access UNC path Date: Tue, 10 Mar 2015 03:02:37 + Hi Yong Thanks for the reply. Yes it works with local drive letter. But I really need to use UNC path because the path is input from at runtime. I cannot dynamically assign a drive letter to arbitrary UNC path at runtime. Is there any work around that I can use UNC path for sc.textFile(…)? Ningjun *From:* java8964 [mailto:java8...@hotmail.com] *Sent:* Monday, March 09, 2015 5:33 PM *To:* Wang, Ningjun (LNG-NPV); user@spark.apache.org *Subject:* RE: sc.textFile() on windows cannot access UNC path This is a Java problem, not really Spark. From this page: http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u You can see that using Java.nio.* on JDK 7, it will fix this issue. But Path class in Hadoop will use java.io.*, instead of java.nio. You need to manually mount your windows remote share a local driver, like Z:, then it should work. Yong -- From: ningjun.w...@lexisnexis.com To: user@spark.apache.org Subject: sc.textFile() on windows cannot access UNC path Date: Mon, 9 Mar 2015 21:09:38 + I am running Spark on windows 2008 R2. I use sc.textFile() to load text file using UNC path, it does not work. *sc*.textFile(*rawfile:10.196.119.230/folder1/abc.txt*, 4).count() Input path does not exist: file:/10.196.119.230/folder1/abc.txt org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/10.196.119.230/tar/Enron/enron-207-short.load at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328) at org.apache.spark.rdd.RDD.count(RDD.scala:910) at ltn.analytics.tests.IndexTest$$anonfun$3.apply$mcV$sp(IndexTest.scala:104) at ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103) at ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163
RE: sc.textFile() on windows cannot access UNC path
I think the work around is clear. Using JDK 7, and implement your own saveAsRemoteWinText() using java.nio.path. Yong From: ningjun.w...@lexisnexis.com To: java8...@hotmail.com; user@spark.apache.org Subject: RE: sc.textFile() on windows cannot access UNC path Date: Tue, 10 Mar 2015 03:02:37 + Hi Yong Thanks for the reply. Yes it works with local drive letter. But I really need to use UNC path because the path is input from at runtime. I cannot dynamically assign a drive letter to arbitrary UNC path at runtime. Is there any work around that I can use UNC path for sc.textFile(…)? Ningjun From: java8964 [mailto:java8...@hotmail.com] Sent: Monday, March 09, 2015 5:33 PM To: Wang, Ningjun (LNG-NPV); user@spark.apache.org Subject: RE: sc.textFile() on windows cannot access UNC path This is a Java problem, not really Spark. From this page: http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u You can see that using Java.nio.* on JDK 7, it will fix this issue. But Path class in Hadoop will use java.io.*, instead of java.nio. You need to manually mount your windows remote share a local driver, like Z:, then it should work. Yong From: ningjun.w...@lexisnexis.com To: user@spark.apache.org Subject: sc.textFile() on windows cannot access UNC path Date: Mon, 9 Mar 2015 21:09:38 + I am running Spark on windows 2008 R2. I use sc.textFile() to load text file using UNC path, it does not work. sc.textFile(rawfile:10.196.119.230/folder1/abc.txt, 4).count() Input path does not exist: file:/10.196.119.230/folder1/abc.txt org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/10.196.119.230/tar/Enron/enron-207-short.load at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328) at org.apache.spark.rdd.RDD.count(RDD.scala:910) at ltn.analytics.tests.IndexTest$$anonfun$3.apply$mcV$sp(IndexTest.scala:104) at ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103) at ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests
RE: sc.textFile() on windows cannot access UNC path
This is a Java problem, not really Spark. From this page: http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u You can see that using Java.nio.* on JDK 7, it will fix this issue. But Path class in Hadoop will use java.io.*, instead of java.nio. You need to manually mount your windows remote share a local driver, like Z:, then it should work. Yong From: ningjun.w...@lexisnexis.com To: user@spark.apache.org Subject: sc.textFile() on windows cannot access UNC path Date: Mon, 9 Mar 2015 21:09:38 + I am running Spark on windows 2008 R2. I use sc.textFile() to load text file using UNC path, it does not work. sc.textFile(rawfile:10.196.119.230/folder1/abc.txt, 4).count() Input path does not exist: file:/10.196.119.230/folder1/abc.txt org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/10.196.119.230/tar/Enron/enron-207-short.load at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328) at org.apache.spark.rdd.RDD.count(RDD.scala:910) at ltn.analytics.tests.IndexTest$$anonfun$3.apply$mcV$sp(IndexTest.scala:104) at ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103) at ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at ltn.analytics.tests.IndexTest.org$scalatest$BeforeAndAfterAll$$super$run(IndexTest.scala:15) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) at
RE: sc.textFile() on windows cannot access UNC path
Hi Yong Thanks for the reply. Yes it works with local drive letter. But I really need to use UNC path because the path is input from at runtime. I cannot dynamically assign a drive letter to arbitrary UNC path at runtime. Is there any work around that I can use UNC path for sc.textFile(...)? Ningjun From: java8964 [mailto:java8...@hotmail.com] Sent: Monday, March 09, 2015 5:33 PM To: Wang, Ningjun (LNG-NPV); user@spark.apache.org Subject: RE: sc.textFile() on windows cannot access UNC path This is a Java problem, not really Spark. From this page: http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u You can see that using Java.nio.* on JDK 7, it will fix this issue. But Path class in Hadoop will use java.io.*, instead of java.nio. You need to manually mount your windows remote share a local driver, like Z:, then it should work. Yong From: ningjun.w...@lexisnexis.commailto:ningjun.w...@lexisnexis.com To: user@spark.apache.orgmailto:user@spark.apache.org Subject: sc.textFile() on windows cannot access UNC path Date: Mon, 9 Mar 2015 21:09:38 + I am running Spark on windows 2008 R2. I use sc.textFile() to load text file using UNC path, it does not work. sc.textFile(rawfile:10.196.119.230/folder1/abc.txtfile:///\\10.196.119.230\folder1\abc.txt, 4).count() Input path does not exist: file:/10.196.119.230/folder1/abc.txt org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/10.196.119.230/tar/Enron/enron-207-short.load at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328) at org.apache.spark.rdd.RDD.count(RDD.scala:910) at ltn.analytics.tests.IndexTest$$anonfun$3.apply$mcV$sp(IndexTest.scala:104) at ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103) at ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run