Re: Re: Spark streaming - textFileStream/fileStream - Get file name

2015-04-29 Thread bit1...@163.com
Correct myself:
For the SparkContext#wholeTextFile, the RDD's elements are kv pairs, the key is 
the file path, and the value is the file content
So,for the SparkContext#wholeTextFile, the RDD has already carried the file 
information.



bit1...@163.com
 
From: Saisai Shao
Date: 2015-04-29 15:50
To: Akhil Das
CC: bit1...@163.com; Vadim Bichutskiy; lokeshkumar; user
Subject: Re: Re: Spark streaming - textFileStream/fileStream - Get file name
Yes, looks like a solution but quite tricky. You have to parse the debug string 
to get the file name, also relies on HadoopRDD to get the file name :)

2015-04-29 14:52 GMT+08:00 Akhil Das ak...@sigmoidanalytics.com:
It is possible to access the filename, its a bit tricky though.

 val fstream = ssc.fileStream[LongWritable, IntWritable,
  SequenceFileInputFormat[LongWritable, IntWritable]](/home/akhld/input/)

fstream.foreach(x ={
  //You can get it with this object.
  println(x.values.toDebugString)

} )



Thanks
Best Regards

On Wed, Apr 29, 2015 at 8:33 AM, bit1...@163.com bit1...@163.com wrote:
For the SparkContext#textFile, if a directory is given as the path parameter  
,then it will pick up the files in the directory, so the same thing will occur.



bit1...@163.com
 
From: Saisai Shao
Date: 2015-04-29 10:54
To: Vadim Bichutskiy
CC: bit1...@163.com; lokeshkumar; user
Subject: Re: Re: Spark streaming - textFileStream/fileStream - Get file name
I think it might be useful in Spark Streaming's file input stream, but not sure 
is it useful in SparkContext#textFile, since we specify the file by our own, so 
why we still need to know the file name.

I will open up a JIRA to mention about this feature.

Thanks
Jerry


2015-04-29 10:49 GMT+08:00 Vadim Bichutskiy vadim.bichuts...@gmail.com:
I was wondering about the same thing.

Vadim
ᐧ

On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com bit1...@163.com wrote:
Looks to me  that the same thing also applies to the SparkContext.textFile or 
SparkContext.wholeTextFile, there is no way in RDD to figure out the file 
information where the data in RDD is from 



bit1...@163.com
 
From: Saisai Shao
Date: 2015-04-29 10:10
To: lokeshkumar
CC: spark users
Subject: Re: Spark streaming - textFileStream/fileStream - Get file name
I think currently there's no API in Spark Streaming you can use to get the file 
names for file input streams. Actually it is not trivial to support this, may 
be you could file a JIRA with wishes you want the community to support, so 
anyone who is interested can take a crack on this.

Thanks
Jerry


2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net:
Hi Forum,

Using spark streaming and listening to the files in HDFS using
textFileStream/fileStream methods, how do we get the fileNames which are
read by these methods?

I used textFileStream which has file contents in JavaDStream and I got no
success with fileStream as it is throwing me a compilation error with spark
version 1.3.1.

Can someone please tell me if we have an API function or any other way to
get the file names that these streaming methods read?

Thanks
Lokesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org






邮件带有附件预览链接,若您转发或回复此邮件时不希望对方预览附件,建议您手动删除链接。
共有 1 个附件
image.png(80K) 极速下载 在线预览 


Re: Re: Spark streaming - textFileStream/fileStream - Get file name

2015-04-29 Thread Saisai Shao
Yes, looks like a solution but quite tricky. You have to parse the debug
string to get the file name, also relies on HadoopRDD to get the file name
:)

2015-04-29 14:52 GMT+08:00 Akhil Das ak...@sigmoidanalytics.com:

 It is possible to access the filename, its a bit tricky though.

  val fstream = ssc.fileStream[LongWritable, IntWritable,
   SequenceFileInputFormat[LongWritable,
 IntWritable]](/home/akhld/input/)

 fstream.foreach(x ={
   //You can get it with this object.
   println(x.values.toDebugString)

 } )

 [image: Inline image 1]

 Thanks
 Best Regards

 On Wed, Apr 29, 2015 at 8:33 AM, bit1...@163.com bit1...@163.com wrote:

 For the SparkContext#textFile, if a directory is given as the path
 parameter  ,then it will pick up the files in the directory, so the same
 thing will occur.

 --
 bit1...@163.com


 *From:* Saisai Shao sai.sai.s...@gmail.com
 *Date:* 2015-04-29 10:54
 *To:* Vadim Bichutskiy vadim.bichuts...@gmail.com
 *CC:* bit1...@163.com; lokeshkumar lok...@dataken.net; user
 user@spark.apache.org
 *Subject:* Re: Re: Spark streaming - textFileStream/fileStream - Get
 file name
 I think it might be useful in Spark Streaming's file input stream, but
 not sure is it useful in SparkContext#textFile, since we specify the file
 by our own, so why we still need to know the file name.

 I will open up a JIRA to mention about this feature.

 Thanks
 Jerry


 2015-04-29 10:49 GMT+08:00 Vadim Bichutskiy vadim.bichuts...@gmail.com:

 I was wondering about the same thing.

 Vadim
 ᐧ

 On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com bit1...@163.com
 wrote:

 Looks to me  that the same thing also applies to the
 SparkContext.textFile or SparkContext.wholeTextFile, there is no way in RDD
 to figure out the file information where the data in RDD is from

 --
 bit1...@163.com


 *From:* Saisai Shao sai.sai.s...@gmail.com
 *Date:* 2015-04-29 10:10
 *To:* lokeshkumar lok...@dataken.net
 *CC:* spark users user@spark.apache.org
 *Subject:* Re: Spark streaming - textFileStream/fileStream - Get file
 name
 I think currently there's no API in Spark Streaming you can use to get
 the file names for file input streams. Actually it is not trivial to
 support this, may be you could file a JIRA with wishes you want the
 community to support, so anyone who is interested can take a crack on this.

 Thanks
 Jerry


 2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net:

 Hi Forum,

 Using spark streaming and listening to the files in HDFS using
 textFileStream/fileStream methods, how do we get the fileNames which
 are
 read by these methods?

 I used textFileStream which has file contents in JavaDStream and I got
 no
 success with fileStream as it is throwing me a compilation error with
 spark
 version 1.3.1.

 Can someone please tell me if we have an API function or any other way
 to
 get the file names that these streaming methods read?

 Thanks
 Lokesh



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html
 Sent from the Apache Spark User List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org








Re: Re: Spark streaming - textFileStream/fileStream - Get file name

2015-04-29 Thread Akhil Das
It is possible to access the filename, its a bit tricky though.

 val fstream = ssc.fileStream[LongWritable, IntWritable,
  SequenceFileInputFormat[LongWritable,
IntWritable]](/home/akhld/input/)

fstream.foreach(x ={
  //You can get it with this object.
  println(x.values.toDebugString)

} )

[image: Inline image 1]

Thanks
Best Regards

On Wed, Apr 29, 2015 at 8:33 AM, bit1...@163.com bit1...@163.com wrote:

 For the SparkContext#textFile, if a directory is given as the path
 parameter  ,then it will pick up the files in the directory, so the same
 thing will occur.

 --
 bit1...@163.com


 *From:* Saisai Shao sai.sai.s...@gmail.com
 *Date:* 2015-04-29 10:54
 *To:* Vadim Bichutskiy vadim.bichuts...@gmail.com
 *CC:* bit1...@163.com; lokeshkumar lok...@dataken.net; user
 user@spark.apache.org
 *Subject:* Re: Re: Spark streaming - textFileStream/fileStream - Get file
 name
 I think it might be useful in Spark Streaming's file input stream, but not
 sure is it useful in SparkContext#textFile, since we specify the file by
 our own, so why we still need to know the file name.

 I will open up a JIRA to mention about this feature.

 Thanks
 Jerry


 2015-04-29 10:49 GMT+08:00 Vadim Bichutskiy vadim.bichuts...@gmail.com:

 I was wondering about the same thing.

 Vadim
 ᐧ

 On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com bit1...@163.com
 wrote:

 Looks to me  that the same thing also applies to the
 SparkContext.textFile or SparkContext.wholeTextFile, there is no way in RDD
 to figure out the file information where the data in RDD is from

 --
 bit1...@163.com


 *From:* Saisai Shao sai.sai.s...@gmail.com
 *Date:* 2015-04-29 10:10
 *To:* lokeshkumar lok...@dataken.net
 *CC:* spark users user@spark.apache.org
 *Subject:* Re: Spark streaming - textFileStream/fileStream - Get file
 name
 I think currently there's no API in Spark Streaming you can use to get
 the file names for file input streams. Actually it is not trivial to
 support this, may be you could file a JIRA with wishes you want the
 community to support, so anyone who is interested can take a crack on this.

 Thanks
 Jerry


 2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net:

 Hi Forum,

 Using spark streaming and listening to the files in HDFS using
 textFileStream/fileStream methods, how do we get the fileNames which are
 read by these methods?

 I used textFileStream which has file contents in JavaDStream and I got
 no
 success with fileStream as it is throwing me a compilation error with
 spark
 version 1.3.1.

 Can someone please tell me if we have an API function or any other way
 to
 get the file names that these streaming methods read?

 Thanks
 Lokesh



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org







Re: Spark streaming - textFileStream/fileStream - Get file name

2015-04-28 Thread Saisai Shao
I think currently there's no API in Spark Streaming you can use to get the
file names for file input streams. Actually it is not trivial to support
this, may be you could file a JIRA with wishes you want the community to
support, so anyone who is interested can take a crack on this.

Thanks
Jerry


2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net:

 Hi Forum,

 Using spark streaming and listening to the files in HDFS using
 textFileStream/fileStream methods, how do we get the fileNames which are
 read by these methods?

 I used textFileStream which has file contents in JavaDStream and I got no
 success with fileStream as it is throwing me a compilation error with spark
 version 1.3.1.

 Can someone please tell me if we have an API function or any other way to
 get the file names that these streaming methods read?

 Thanks
 Lokesh



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Re: Spark streaming - textFileStream/fileStream - Get file name

2015-04-28 Thread bit1...@163.com
Looks to me  that the same thing also applies to the SparkContext.textFile or 
SparkContext.wholeTextFile, there is no way in RDD to figure out the file 
information where the data in RDD is from 



bit1...@163.com
 
From: Saisai Shao
Date: 2015-04-29 10:10
To: lokeshkumar
CC: spark users
Subject: Re: Spark streaming - textFileStream/fileStream - Get file name
I think currently there's no API in Spark Streaming you can use to get the file 
names for file input streams. Actually it is not trivial to support this, may 
be you could file a JIRA with wishes you want the community to support, so 
anyone who is interested can take a crack on this.

Thanks
Jerry


2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net:
Hi Forum,

Using spark streaming and listening to the files in HDFS using
textFileStream/fileStream methods, how do we get the fileNames which are
read by these methods?

I used textFileStream which has file contents in JavaDStream and I got no
success with fileStream as it is throwing me a compilation error with spark
version 1.3.1.

Can someone please tell me if we have an API function or any other way to
get the file names that these streaming methods read?

Thanks
Lokesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




Re: Re: Spark streaming - textFileStream/fileStream - Get file name

2015-04-28 Thread Vadim Bichutskiy
I was wondering about the same thing.

Vadim
ᐧ

On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com bit1...@163.com wrote:

 Looks to me  that the same thing also applies to the SparkContext.textFile
 or SparkContext.wholeTextFile, there is no way in RDD to figure out the
 file information where the data in RDD is from

 --
 bit1...@163.com


 *From:* Saisai Shao sai.sai.s...@gmail.com
 *Date:* 2015-04-29 10:10
 *To:* lokeshkumar lok...@dataken.net
 *CC:* spark users user@spark.apache.org
 *Subject:* Re: Spark streaming - textFileStream/fileStream - Get file name
 I think currently there's no API in Spark Streaming you can use to get the
 file names for file input streams. Actually it is not trivial to support
 this, may be you could file a JIRA with wishes you want the community to
 support, so anyone who is interested can take a crack on this.

 Thanks
 Jerry


 2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net:

 Hi Forum,

 Using spark streaming and listening to the files in HDFS using
 textFileStream/fileStream methods, how do we get the fileNames which are
 read by these methods?

 I used textFileStream which has file contents in JavaDStream and I got no
 success with fileStream as it is throwing me a compilation error with
 spark
 version 1.3.1.

 Can someone please tell me if we have an API function or any other way to
 get the file names that these streaming methods read?

 Thanks
 Lokesh



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: Re: Spark streaming - textFileStream/fileStream - Get file name

2015-04-28 Thread Saisai Shao
I think it might be useful in Spark Streaming's file input stream, but not
sure is it useful in SparkContext#textFile, since we specify the file by
our own, so why we still need to know the file name.

I will open up a JIRA to mention about this feature.

Thanks
Jerry


2015-04-29 10:49 GMT+08:00 Vadim Bichutskiy vadim.bichuts...@gmail.com:

 I was wondering about the same thing.

 Vadim
 ᐧ

 On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com bit1...@163.com wrote:

 Looks to me  that the same thing also applies to the
 SparkContext.textFile or SparkContext.wholeTextFile, there is no way in RDD
 to figure out the file information where the data in RDD is from

 --
 bit1...@163.com


 *From:* Saisai Shao sai.sai.s...@gmail.com
 *Date:* 2015-04-29 10:10
 *To:* lokeshkumar lok...@dataken.net
 *CC:* spark users user@spark.apache.org
 *Subject:* Re: Spark streaming - textFileStream/fileStream - Get file
 name
 I think currently there's no API in Spark Streaming you can use to get
 the file names for file input streams. Actually it is not trivial to
 support this, may be you could file a JIRA with wishes you want the
 community to support, so anyone who is interested can take a crack on this.

 Thanks
 Jerry


 2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net:

 Hi Forum,

 Using spark streaming and listening to the files in HDFS using
 textFileStream/fileStream methods, how do we get the fileNames which are
 read by these methods?

 I used textFileStream which has file contents in JavaDStream and I got no
 success with fileStream as it is throwing me a compilation error with
 spark
 version 1.3.1.

 Can someone please tell me if we have an API function or any other way to
 get the file names that these streaming methods read?

 Thanks
 Lokesh



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






Re: Re: Spark streaming - textFileStream/fileStream - Get file name

2015-04-28 Thread bit1...@163.com
For the SparkContext#textFile, if a directory is given as the path parameter  
,then it will pick up the files in the directory, so the same thing will occur.



bit1...@163.com
 
From: Saisai Shao
Date: 2015-04-29 10:54
To: Vadim Bichutskiy
CC: bit1...@163.com; lokeshkumar; user
Subject: Re: Re: Spark streaming - textFileStream/fileStream - Get file name
I think it might be useful in Spark Streaming's file input stream, but not sure 
is it useful in SparkContext#textFile, since we specify the file by our own, so 
why we still need to know the file name.

I will open up a JIRA to mention about this feature.

Thanks
Jerry


2015-04-29 10:49 GMT+08:00 Vadim Bichutskiy vadim.bichuts...@gmail.com:
I was wondering about the same thing.

Vadim
ᐧ

On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com bit1...@163.com wrote:
Looks to me  that the same thing also applies to the SparkContext.textFile or 
SparkContext.wholeTextFile, there is no way in RDD to figure out the file 
information where the data in RDD is from 



bit1...@163.com
 
From: Saisai Shao
Date: 2015-04-29 10:10
To: lokeshkumar
CC: spark users
Subject: Re: Spark streaming - textFileStream/fileStream - Get file name
I think currently there's no API in Spark Streaming you can use to get the file 
names for file input streams. Actually it is not trivial to support this, may 
be you could file a JIRA with wishes you want the community to support, so 
anyone who is interested can take a crack on this.

Thanks
Jerry


2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net:
Hi Forum,

Using spark streaming and listening to the files in HDFS using
textFileStream/fileStream methods, how do we get the fileNames which are
read by these methods?

I used textFileStream which has file contents in JavaDStream and I got no
success with fileStream as it is throwing me a compilation error with spark
version 1.3.1.

Can someone please tell me if we have an API function or any other way to
get the file names that these streaming methods read?

Thanks
Lokesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org






Spark streaming - textFileStream/fileStream - Get file name

2015-04-28 Thread lokeshkumar
Hi Forum,

Using spark streaming and listening to the files in HDFS using
textFileStream/fileStream methods, how do we get the fileNames which are
read by these methods?

I used textFileStream which has file contents in JavaDStream and I got no
success with fileStream as it is throwing me a compilation error with spark
version 1.3.1.

Can someone please tell me if we have an API function or any other way to
get the file names that these streaming methods read?

Thanks
Lokesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org