RE: sc.textFile() on windows cannot access UNC path

2015-03-12 Thread Wang, Ningjun (LNG-NPV)
Thanks for the reference. Is the following procedure correct?

1.Copy of the Hadoop source code 
org.apache.hadoop.mapreduce.lib.input .TextInputFormat.java as my own class, 
e.g. UncTextInputFormat.java
2.Modify UncTextInputFormat.java to handle UNC path
3.Call sc.newAPIHadoopFile(…) with

sc.newAPIHadoopFile[LongWritable, Text, 
UncTextInputFormat](“file:10.196.119.230/folder1/abc.txt”,
 classOf[UncTextInputFormat],
 classOf[LongWritable],
classOf[Text], conf)

Ningjun

From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Wednesday, March 11, 2015 2:40 AM
To: Wang, Ningjun (LNG-NPV)
Cc: java8964; user@spark.apache.org
Subject: Re: sc.textFile() on windows cannot access UNC path

​​
I don't have a complete example for your usecase, but you can see a lot of 
codes showing how to use new APIHadoopFile from 
herehttps://github.com/search?q=sc.newAPIHadoopFiletype=Codeutf8=%E2%9C%93

Thanks
Best Regards

On Tue, Mar 10, 2015 at 7:37 PM, Wang, Ningjun (LNG-NPV) 
ningjun.w...@lexisnexis.commailto:ningjun.w...@lexisnexis.com wrote:
This sounds like the right approach. Is there any sample code showing how to 
use sc.newAPIHadoopFile  ? I am new to Spark and don’t know much about Hadoop. 
I just want to read a text file from UNC path into an RDD.

Thanks


From: Akhil Das 
[mailto:ak...@sigmoidanalytics.commailto:ak...@sigmoidanalytics.com]
Sent: Tuesday, March 10, 2015 9:14 AM
To: java8964
Cc: Wang, Ningjun (LNG-NPV); user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: sc.textFile() on windows cannot access UNC path

You can create your own Input Reader (using java.nio.*) and pass it to the 
sc.newAPIHadoopFile while reading.


Thanks
Best Regards

On Tue, Mar 10, 2015 at 6:28 PM, java8964 
java8...@hotmail.commailto:java8...@hotmail.com wrote:
I think the work around is clear.

Using JDK 7, and implement your own saveAsRemoteWinText() using java.nio.path.

Yong

From: ningjun.w...@lexisnexis.commailto:ningjun.w...@lexisnexis.com
To: java8...@hotmail.commailto:java8...@hotmail.com; 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: RE: sc.textFile() on windows cannot access UNC path
Date: Tue, 10 Mar 2015 03:02:37 +


Hi Yong



Thanks for the reply. Yes it works with local drive letter. But I really need 
to use UNC path because the path is input from at runtime. I cannot dynamically 
assign a drive letter to arbitrary UNC path at runtime.



Is there any work around that I can use UNC path for sc.textFile(…)?





Ningjun





From: java8964 [mailto:java8...@hotmail.commailto:java8...@hotmail.com]
Sent: Monday, March 09, 2015 5:33 PM
To: Wang, Ningjun (LNG-NPV); user@spark.apache.orgmailto:user@spark.apache.org
Subject: RE: sc.textFile() on windows cannot access UNC path



This is a Java problem, not really Spark.



From this page: 
http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u



You can see that using Java.nio.* on JDK 7, it will fix this issue. But Path 
class in Hadoop will use java.io.*, instead of java.nio.



You need to manually mount your windows remote share a local driver, like Z:, 
then it should work.



Yong



From: ningjun.w...@lexisnexis.commailto:ningjun.w...@lexisnexis.com
To: user@spark.apache.orgmailto:user@spark.apache.org
Subject: sc.textFile() on windows cannot access UNC path
Date: Mon, 9 Mar 2015 21:09:38 +

I am running Spark on windows 2008 R2. I use sc.textFile() to load text file  
using UNC path, it does not work.



sc.textFile(rawfile:10.196.119.230/folder1/abc.txtfile:///\\10.196.119.230\folder1\abc.txt,
 4).count()



Input path does not exist: 
file:/10.196.119.230/folder1/abc.txthttp://10.196.119.230/folder1/abc.txt

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
file:/10.196.119.230/tar/Enron/enron-207-short.loadhttp://10.196.119.230/tar/Enron/enron-207-short.load

at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)

at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)

at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)

at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)

at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:203

Re: sc.textFile() on windows cannot access UNC path

2015-03-12 Thread Akhil Das
Sounds like the way of doing it. Could you try accessing a file from UNC
Path with native Java nio code and make sure it is able access it with the
URI file:10.196.119.230/folder1/abc.txt?

Thanks
Best Regards

On Wed, Mar 11, 2015 at 7:45 PM, Wang, Ningjun (LNG-NPV) 
ningjun.w...@lexisnexis.com wrote:

  Thanks for the reference. Is the following procedure correct?



 1.Copy of the Hadoop source code
 org.apache.hadoop.mapreduce.lib.input .TextInputFormat.java as my own
 class, e.g. UncTextInputFormat.java

 2.Modify UncTextInputFormat.java to handle UNC path

 3.Call sc.newAPIHadoopFile(…) with



 sc.newAPIHadoopFile[LongWritable, Text, UncTextInputFormat](“file:
 10.196.119.230/folder1/abc.txt”,

  classOf[UncTextInputFormat],

  classOf[LongWritable],

 classOf[Text], conf)



 Ningjun



 *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
 *Sent:* Wednesday, March 11, 2015 2:40 AM
 *To:* Wang, Ningjun (LNG-NPV)
 *Cc:* java8964; user@spark.apache.org

 *Subject:* Re: sc.textFile() on windows cannot access UNC path



 ​​

 I don't have a complete example for your usecase, but you can see a lot of
 codes showing how to use new APIHadoopFile from here
 https://github.com/search?q=sc.newAPIHadoopFiletype=Codeutf8=%E2%9C%93


   Thanks

 Best Regards



 On Tue, Mar 10, 2015 at 7:37 PM, Wang, Ningjun (LNG-NPV) 
 ningjun.w...@lexisnexis.com wrote:

 This sounds like the right approach. Is there any sample code showing how
 to use sc.newAPIHadoopFile  ? I am new to Spark and don’t know much about
 Hadoop. I just want to read a text file from UNC path into an RDD.



 Thanks





 *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
 *Sent:* Tuesday, March 10, 2015 9:14 AM
 *To:* java8964
 *Cc:* Wang, Ningjun (LNG-NPV); user@spark.apache.org
 *Subject:* Re: sc.textFile() on windows cannot access UNC path



 You can create your own Input Reader (using java.nio.*) and pass it to the
 sc.newAPIHadoopFile while reading.




   Thanks

 Best Regards



 On Tue, Mar 10, 2015 at 6:28 PM, java8964 java8...@hotmail.com wrote:

 I think the work around is clear.



 Using JDK 7, and implement your own saveAsRemoteWinText() using
 java.nio.path.



 Yong
  --

 From: ningjun.w...@lexisnexis.com
 To: java8...@hotmail.com; user@spark.apache.org
 Subject: RE: sc.textFile() on windows cannot access UNC path
 Date: Tue, 10 Mar 2015 03:02:37 +



 Hi Yong



 Thanks for the reply. Yes it works with local drive letter. But I really
 need to use UNC path because the path is input from at runtime. I cannot
 dynamically assign a drive letter to arbitrary UNC path at runtime.



 Is there any work around that I can use UNC path for sc.textFile(…)?





 Ningjun





 *From:* java8964 [mailto:java8...@hotmail.com]
 *Sent:* Monday, March 09, 2015 5:33 PM
 *To:* Wang, Ningjun (LNG-NPV); user@spark.apache.org
 *Subject:* RE: sc.textFile() on windows cannot access UNC path



 This is a Java problem, not really Spark.



 From this page:
 http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u



 You can see that using Java.nio.* on JDK 7, it will fix this issue. But
 Path class in Hadoop will use java.io.*, instead of java.nio.



 You need to manually mount your windows remote share a local driver, like
 Z:, then it should work.



 Yong
  --

 From: ningjun.w...@lexisnexis.com
 To: user@spark.apache.org
 Subject: sc.textFile() on windows cannot access UNC path
 Date: Mon, 9 Mar 2015 21:09:38 +

 I am running Spark on windows 2008 R2. I use sc.textFile() to load text
 file  using UNC path, it does not work.



 *sc*.textFile(*rawfile:10.196.119.230/folder1/abc.txt*, 4).count()



 Input path does not exist: file:/10.196.119.230/folder1/abc.txt

 org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
 file:/10.196.119.230/tar/Enron/enron-207-short.load

 at
 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)

 at
 org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)

 at
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)

 at
 org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:203

Re: sc.textFile() on windows cannot access UNC path

2015-03-11 Thread Akhil Das
​​
I don't have a complete example for your usecase, but you can see a lot of
codes showing how to use new APIHadoopFile from here
https://github.com/search?q=sc.newAPIHadoopFiletype=Codeutf8=%E2%9C%93

Thanks
Best Regards

On Tue, Mar 10, 2015 at 7:37 PM, Wang, Ningjun (LNG-NPV) 
ningjun.w...@lexisnexis.com wrote:

  This sounds like the right approach. Is there any sample code showing
 how to use sc.newAPIHadoopFile  ? I am new to Spark and don’t know much
 about Hadoop. I just want to read a text file from UNC path into an RDD.



 Thanks





 *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
 *Sent:* Tuesday, March 10, 2015 9:14 AM
 *To:* java8964
 *Cc:* Wang, Ningjun (LNG-NPV); user@spark.apache.org
 *Subject:* Re: sc.textFile() on windows cannot access UNC path



 You can create your own Input Reader (using java.nio.*) and pass it to the
 sc.newAPIHadoopFile while reading.




   Thanks

 Best Regards



 On Tue, Mar 10, 2015 at 6:28 PM, java8964 java8...@hotmail.com wrote:

 I think the work around is clear.



 Using JDK 7, and implement your own saveAsRemoteWinText() using
 java.nio.path.



 Yong
  --

 From: ningjun.w...@lexisnexis.com
 To: java8...@hotmail.com; user@spark.apache.org
 Subject: RE: sc.textFile() on windows cannot access UNC path
 Date: Tue, 10 Mar 2015 03:02:37 +



 Hi Yong



 Thanks for the reply. Yes it works with local drive letter. But I really
 need to use UNC path because the path is input from at runtime. I cannot
 dynamically assign a drive letter to arbitrary UNC path at runtime.



 Is there any work around that I can use UNC path for sc.textFile(…)?





 Ningjun





 *From:* java8964 [mailto:java8...@hotmail.com]
 *Sent:* Monday, March 09, 2015 5:33 PM
 *To:* Wang, Ningjun (LNG-NPV); user@spark.apache.org
 *Subject:* RE: sc.textFile() on windows cannot access UNC path



 This is a Java problem, not really Spark.



 From this page:
 http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u



 You can see that using Java.nio.* on JDK 7, it will fix this issue. But
 Path class in Hadoop will use java.io.*, instead of java.nio.



 You need to manually mount your windows remote share a local driver, like
 Z:, then it should work.



 Yong
  --

 From: ningjun.w...@lexisnexis.com
 To: user@spark.apache.org
 Subject: sc.textFile() on windows cannot access UNC path
 Date: Mon, 9 Mar 2015 21:09:38 +

 I am running Spark on windows 2008 R2. I use sc.textFile() to load text
 file  using UNC path, it does not work.



 *sc*.textFile(*rawfile:10.196.119.230/folder1/abc.txt*, 4).count()



 Input path does not exist: file:/10.196.119.230/folder1/abc.txt

 org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
 file:/10.196.119.230/tar/Enron/enron-207-short.load

 at
 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)

 at
 org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)

 at
 org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)

 at
 org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

 at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)

 at
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)

 at org.apache.spark.rdd.RDD.count(RDD.scala:910)

 at
 ltn.analytics.tests.IndexTest$$anonfun$3.apply$mcV$sp(IndexTest.scala:104)

 at
 ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)

 at
 ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)

 at
 org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)

 at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)

 at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)

 at org.scalatest.Transformer.apply(Transformer.scala:22)

 at org.scalatest.Transformer.apply(Transformer.scala:20)

 at
 org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)

 at org.scalatest.Suite$class.withFixture(Suite.scala:1122)

 at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)

 at
 org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163

RE: sc.textFile() on windows cannot access UNC path

2015-03-10 Thread java8964
I think the work around is clear.
Using JDK 7, and implement your own saveAsRemoteWinText() using java.nio.path.
Yong

From: ningjun.w...@lexisnexis.com
To: java8...@hotmail.com; user@spark.apache.org
Subject: RE: sc.textFile() on windows cannot access UNC path
Date: Tue, 10 Mar 2015 03:02:37 +









Hi Yong
 
Thanks for the reply. Yes it works with local drive letter. But I really need 
to use UNC path because the path is input from at runtime. I cannot dynamically 
assign a drive letter
 to arbitrary UNC path at runtime.
 
Is there any work around that I can use UNC path for sc.textFile(…)?

 
 

Ningjun
 

 


From: java8964 [mailto:java8...@hotmail.com]


Sent: Monday, March 09, 2015 5:33 PM

To: Wang, Ningjun (LNG-NPV); user@spark.apache.org

Subject: RE: sc.textFile() on windows cannot access UNC path


 

This is a Java problem, not really Spark.

 


From this page: 
http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u


 


You can see that using Java.nio.* on JDK 7, it will fix this issue. But Path 
class in Hadoop will use java.io.*, instead of java.nio.


 


You need to manually mount your windows remote share a local driver, like Z:, 
then it should work.


 


Yong




From:
ningjun.w...@lexisnexis.com

To: user@spark.apache.org

Subject: sc.textFile() on windows cannot access UNC path

Date: Mon, 9 Mar 2015 21:09:38 +

I am running Spark on windows 2008 R2. I use sc.textFile() to load text file  
using UNC path, it does not work.
 
sc.textFile(rawfile:10.196.119.230/folder1/abc.txt,
4).count()

 
Input path does not exist: file:/10.196.119.230/folder1/abc.txt
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
file:/10.196.119.230/tar/Enron/enron-207-short.load
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
at org.apache.spark.rdd.RDD.count(RDD.scala:910)
at 
ltn.analytics.tests.IndexTest$$anonfun$3.apply$mcV$sp(IndexTest.scala:104)
at 
ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)
at 
ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)
at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
at org.scalatest.FunSuiteLike$class.runTests

RE: sc.textFile() on windows cannot access UNC path

2015-03-09 Thread java8964
This is a Java problem, not really Spark.
From this page: 
http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u
You can see that using Java.nio.* on JDK 7, it will fix this issue. But Path 
class in Hadoop will use java.io.*, instead of java.nio.
You need to manually mount your windows remote share a local driver, like Z:, 
then it should work.
Yong

From: ningjun.w...@lexisnexis.com
To: user@spark.apache.org
Subject: sc.textFile() on windows cannot access UNC path
Date: Mon, 9 Mar 2015 21:09:38 +









I am running Spark on windows 2008 R2. I use sc.textFile() to load text file  
using UNC path, it does not work.
 
sc.textFile(rawfile:10.196.119.230/folder1/abc.txt,
4).count()

 
Input path does not exist: file:/10.196.119.230/folder1/abc.txt
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
file:/10.196.119.230/tar/Enron/enron-207-short.load
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
at org.apache.spark.rdd.RDD.count(RDD.scala:910)
at 
ltn.analytics.tests.IndexTest$$anonfun$3.apply$mcV$sp(IndexTest.scala:104)
at 
ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)
at 
ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)
at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
at org.scalatest.Suite$class.run(Suite.scala:1424)
at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
at 
ltn.analytics.tests.IndexTest.org$scalatest$BeforeAndAfterAll$$super$run(IndexTest.scala:15)
at 
org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
at 

RE: sc.textFile() on windows cannot access UNC path

2015-03-09 Thread Wang, Ningjun (LNG-NPV)
Hi Yong

Thanks for the reply. Yes it works with local drive letter. But I really need 
to use UNC path because the path is input from at runtime. I cannot dynamically 
assign a drive letter to arbitrary UNC path at runtime.

Is there any work around that I can use UNC path for sc.textFile(...)?


Ningjun


From: java8964 [mailto:java8...@hotmail.com]
Sent: Monday, March 09, 2015 5:33 PM
To: Wang, Ningjun (LNG-NPV); user@spark.apache.org
Subject: RE: sc.textFile() on windows cannot access UNC path

This is a Java problem, not really Spark.

From this page: 
http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u

You can see that using Java.nio.* on JDK 7, it will fix this issue. But Path 
class in Hadoop will use java.io.*, instead of java.nio.

You need to manually mount your windows remote share a local driver, like Z:, 
then it should work.

Yong

From: ningjun.w...@lexisnexis.commailto:ningjun.w...@lexisnexis.com
To: user@spark.apache.orgmailto:user@spark.apache.org
Subject: sc.textFile() on windows cannot access UNC path
Date: Mon, 9 Mar 2015 21:09:38 +
I am running Spark on windows 2008 R2. I use sc.textFile() to load text file  
using UNC path, it does not work.

sc.textFile(rawfile:10.196.119.230/folder1/abc.txtfile:///\\10.196.119.230\folder1\abc.txt,
 4).count()

Input path does not exist: file:/10.196.119.230/folder1/abc.txt
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
file:/10.196.119.230/tar/Enron/enron-207-short.load
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
at org.apache.spark.rdd.RDD.count(RDD.scala:910)
at 
ltn.analytics.tests.IndexTest$$anonfun$3.apply$mcV$sp(IndexTest.scala:104)
at 
ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)
at 
ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)
at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
at org.scalatest.Suite$class.run(Suite.scala:1424)
at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run