Re: Re: Spark streaming - textFileStream/fileStream - Get file name
Correct myself: For the SparkContext#wholeTextFile, the RDD's elements are kv pairs, the key is the file path, and the value is the file content So,for the SparkContext#wholeTextFile, the RDD has already carried the file information. bit1...@163.com From: Saisai Shao Date: 2015-04-29 15:50 To: Akhil Das CC: bit1...@163.com; Vadim Bichutskiy; lokeshkumar; user Subject: Re: Re: Spark streaming - textFileStream/fileStream - Get file name Yes, looks like a solution but quite tricky. You have to parse the debug string to get the file name, also relies on HadoopRDD to get the file name :) 2015-04-29 14:52 GMT+08:00 Akhil Das ak...@sigmoidanalytics.com: It is possible to access the filename, its a bit tricky though. val fstream = ssc.fileStream[LongWritable, IntWritable, SequenceFileInputFormat[LongWritable, IntWritable]](/home/akhld/input/) fstream.foreach(x ={ //You can get it with this object. println(x.values.toDebugString) } ) Thanks Best Regards On Wed, Apr 29, 2015 at 8:33 AM, bit1...@163.com bit1...@163.com wrote: For the SparkContext#textFile, if a directory is given as the path parameter ,then it will pick up the files in the directory, so the same thing will occur. bit1...@163.com From: Saisai Shao Date: 2015-04-29 10:54 To: Vadim Bichutskiy CC: bit1...@163.com; lokeshkumar; user Subject: Re: Re: Spark streaming - textFileStream/fileStream - Get file name I think it might be useful in Spark Streaming's file input stream, but not sure is it useful in SparkContext#textFile, since we specify the file by our own, so why we still need to know the file name. I will open up a JIRA to mention about this feature. Thanks Jerry 2015-04-29 10:49 GMT+08:00 Vadim Bichutskiy vadim.bichuts...@gmail.com: I was wondering about the same thing. Vadim ᐧ On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com bit1...@163.com wrote: Looks to me that the same thing also applies to the SparkContext.textFile or SparkContext.wholeTextFile, there is no way in RDD to figure out the file information where the data in RDD is from bit1...@163.com From: Saisai Shao Date: 2015-04-29 10:10 To: lokeshkumar CC: spark users Subject: Re: Spark streaming - textFileStream/fileStream - Get file name I think currently there's no API in Spark Streaming you can use to get the file names for file input streams. Actually it is not trivial to support this, may be you could file a JIRA with wishes you want the community to support, so anyone who is interested can take a crack on this. Thanks Jerry 2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net: Hi Forum, Using spark streaming and listening to the files in HDFS using textFileStream/fileStream methods, how do we get the fileNames which are read by these methods? I used textFileStream which has file contents in JavaDStream and I got no success with fileStream as it is throwing me a compilation error with spark version 1.3.1. Can someone please tell me if we have an API function or any other way to get the file names that these streaming methods read? Thanks Lokesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org 邮件带有附件预览链接,若您转发或回复此邮件时不希望对方预览附件,建议您手动删除链接。 共有 1 个附件 image.png(80K) 极速下载 在线预览
Re: Re: Spark streaming - textFileStream/fileStream - Get file name
Yes, looks like a solution but quite tricky. You have to parse the debug string to get the file name, also relies on HadoopRDD to get the file name :) 2015-04-29 14:52 GMT+08:00 Akhil Das ak...@sigmoidanalytics.com: It is possible to access the filename, its a bit tricky though. val fstream = ssc.fileStream[LongWritable, IntWritable, SequenceFileInputFormat[LongWritable, IntWritable]](/home/akhld/input/) fstream.foreach(x ={ //You can get it with this object. println(x.values.toDebugString) } ) [image: Inline image 1] Thanks Best Regards On Wed, Apr 29, 2015 at 8:33 AM, bit1...@163.com bit1...@163.com wrote: For the SparkContext#textFile, if a directory is given as the path parameter ,then it will pick up the files in the directory, so the same thing will occur. -- bit1...@163.com *From:* Saisai Shao sai.sai.s...@gmail.com *Date:* 2015-04-29 10:54 *To:* Vadim Bichutskiy vadim.bichuts...@gmail.com *CC:* bit1...@163.com; lokeshkumar lok...@dataken.net; user user@spark.apache.org *Subject:* Re: Re: Spark streaming - textFileStream/fileStream - Get file name I think it might be useful in Spark Streaming's file input stream, but not sure is it useful in SparkContext#textFile, since we specify the file by our own, so why we still need to know the file name. I will open up a JIRA to mention about this feature. Thanks Jerry 2015-04-29 10:49 GMT+08:00 Vadim Bichutskiy vadim.bichuts...@gmail.com: I was wondering about the same thing. Vadim ᐧ On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com bit1...@163.com wrote: Looks to me that the same thing also applies to the SparkContext.textFile or SparkContext.wholeTextFile, there is no way in RDD to figure out the file information where the data in RDD is from -- bit1...@163.com *From:* Saisai Shao sai.sai.s...@gmail.com *Date:* 2015-04-29 10:10 *To:* lokeshkumar lok...@dataken.net *CC:* spark users user@spark.apache.org *Subject:* Re: Spark streaming - textFileStream/fileStream - Get file name I think currently there's no API in Spark Streaming you can use to get the file names for file input streams. Actually it is not trivial to support this, may be you could file a JIRA with wishes you want the community to support, so anyone who is interested can take a crack on this. Thanks Jerry 2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net: Hi Forum, Using spark streaming and listening to the files in HDFS using textFileStream/fileStream methods, how do we get the fileNames which are read by these methods? I used textFileStream which has file contents in JavaDStream and I got no success with fileStream as it is throwing me a compilation error with spark version 1.3.1. Can someone please tell me if we have an API function or any other way to get the file names that these streaming methods read? Thanks Lokesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Re: Spark streaming - textFileStream/fileStream - Get file name
It is possible to access the filename, its a bit tricky though. val fstream = ssc.fileStream[LongWritable, IntWritable, SequenceFileInputFormat[LongWritable, IntWritable]](/home/akhld/input/) fstream.foreach(x ={ //You can get it with this object. println(x.values.toDebugString) } ) [image: Inline image 1] Thanks Best Regards On Wed, Apr 29, 2015 at 8:33 AM, bit1...@163.com bit1...@163.com wrote: For the SparkContext#textFile, if a directory is given as the path parameter ,then it will pick up the files in the directory, so the same thing will occur. -- bit1...@163.com *From:* Saisai Shao sai.sai.s...@gmail.com *Date:* 2015-04-29 10:54 *To:* Vadim Bichutskiy vadim.bichuts...@gmail.com *CC:* bit1...@163.com; lokeshkumar lok...@dataken.net; user user@spark.apache.org *Subject:* Re: Re: Spark streaming - textFileStream/fileStream - Get file name I think it might be useful in Spark Streaming's file input stream, but not sure is it useful in SparkContext#textFile, since we specify the file by our own, so why we still need to know the file name. I will open up a JIRA to mention about this feature. Thanks Jerry 2015-04-29 10:49 GMT+08:00 Vadim Bichutskiy vadim.bichuts...@gmail.com: I was wondering about the same thing. Vadim ᐧ On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com bit1...@163.com wrote: Looks to me that the same thing also applies to the SparkContext.textFile or SparkContext.wholeTextFile, there is no way in RDD to figure out the file information where the data in RDD is from -- bit1...@163.com *From:* Saisai Shao sai.sai.s...@gmail.com *Date:* 2015-04-29 10:10 *To:* lokeshkumar lok...@dataken.net *CC:* spark users user@spark.apache.org *Subject:* Re: Spark streaming - textFileStream/fileStream - Get file name I think currently there's no API in Spark Streaming you can use to get the file names for file input streams. Actually it is not trivial to support this, may be you could file a JIRA with wishes you want the community to support, so anyone who is interested can take a crack on this. Thanks Jerry 2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net: Hi Forum, Using spark streaming and listening to the files in HDFS using textFileStream/fileStream methods, how do we get the fileNames which are read by these methods? I used textFileStream which has file contents in JavaDStream and I got no success with fileStream as it is throwing me a compilation error with spark version 1.3.1. Can someone please tell me if we have an API function or any other way to get the file names that these streaming methods read? Thanks Lokesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark streaming - textFileStream/fileStream - Get file name
I think currently there's no API in Spark Streaming you can use to get the file names for file input streams. Actually it is not trivial to support this, may be you could file a JIRA with wishes you want the community to support, so anyone who is interested can take a crack on this. Thanks Jerry 2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net: Hi Forum, Using spark streaming and listening to the files in HDFS using textFileStream/fileStream methods, how do we get the fileNames which are read by these methods? I used textFileStream which has file contents in JavaDStream and I got no success with fileStream as it is throwing me a compilation error with spark version 1.3.1. Can someone please tell me if we have an API function or any other way to get the file names that these streaming methods read? Thanks Lokesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Re: Spark streaming - textFileStream/fileStream - Get file name
Looks to me that the same thing also applies to the SparkContext.textFile or SparkContext.wholeTextFile, there is no way in RDD to figure out the file information where the data in RDD is from bit1...@163.com From: Saisai Shao Date: 2015-04-29 10:10 To: lokeshkumar CC: spark users Subject: Re: Spark streaming - textFileStream/fileStream - Get file name I think currently there's no API in Spark Streaming you can use to get the file names for file input streams. Actually it is not trivial to support this, may be you could file a JIRA with wishes you want the community to support, so anyone who is interested can take a crack on this. Thanks Jerry 2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net: Hi Forum, Using spark streaming and listening to the files in HDFS using textFileStream/fileStream methods, how do we get the fileNames which are read by these methods? I used textFileStream which has file contents in JavaDStream and I got no success with fileStream as it is throwing me a compilation error with spark version 1.3.1. Can someone please tell me if we have an API function or any other way to get the file names that these streaming methods read? Thanks Lokesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Re: Spark streaming - textFileStream/fileStream - Get file name
I was wondering about the same thing. Vadim ᐧ On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com bit1...@163.com wrote: Looks to me that the same thing also applies to the SparkContext.textFile or SparkContext.wholeTextFile, there is no way in RDD to figure out the file information where the data in RDD is from -- bit1...@163.com *From:* Saisai Shao sai.sai.s...@gmail.com *Date:* 2015-04-29 10:10 *To:* lokeshkumar lok...@dataken.net *CC:* spark users user@spark.apache.org *Subject:* Re: Spark streaming - textFileStream/fileStream - Get file name I think currently there's no API in Spark Streaming you can use to get the file names for file input streams. Actually it is not trivial to support this, may be you could file a JIRA with wishes you want the community to support, so anyone who is interested can take a crack on this. Thanks Jerry 2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net: Hi Forum, Using spark streaming and listening to the files in HDFS using textFileStream/fileStream methods, how do we get the fileNames which are read by these methods? I used textFileStream which has file contents in JavaDStream and I got no success with fileStream as it is throwing me a compilation error with spark version 1.3.1. Can someone please tell me if we have an API function or any other way to get the file names that these streaming methods read? Thanks Lokesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Re: Spark streaming - textFileStream/fileStream - Get file name
I think it might be useful in Spark Streaming's file input stream, but not sure is it useful in SparkContext#textFile, since we specify the file by our own, so why we still need to know the file name. I will open up a JIRA to mention about this feature. Thanks Jerry 2015-04-29 10:49 GMT+08:00 Vadim Bichutskiy vadim.bichuts...@gmail.com: I was wondering about the same thing. Vadim ᐧ On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com bit1...@163.com wrote: Looks to me that the same thing also applies to the SparkContext.textFile or SparkContext.wholeTextFile, there is no way in RDD to figure out the file information where the data in RDD is from -- bit1...@163.com *From:* Saisai Shao sai.sai.s...@gmail.com *Date:* 2015-04-29 10:10 *To:* lokeshkumar lok...@dataken.net *CC:* spark users user@spark.apache.org *Subject:* Re: Spark streaming - textFileStream/fileStream - Get file name I think currently there's no API in Spark Streaming you can use to get the file names for file input streams. Actually it is not trivial to support this, may be you could file a JIRA with wishes you want the community to support, so anyone who is interested can take a crack on this. Thanks Jerry 2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net: Hi Forum, Using spark streaming and listening to the files in HDFS using textFileStream/fileStream methods, how do we get the fileNames which are read by these methods? I used textFileStream which has file contents in JavaDStream and I got no success with fileStream as it is throwing me a compilation error with spark version 1.3.1. Can someone please tell me if we have an API function or any other way to get the file names that these streaming methods read? Thanks Lokesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Re: Spark streaming - textFileStream/fileStream - Get file name
For the SparkContext#textFile, if a directory is given as the path parameter ,then it will pick up the files in the directory, so the same thing will occur. bit1...@163.com From: Saisai Shao Date: 2015-04-29 10:54 To: Vadim Bichutskiy CC: bit1...@163.com; lokeshkumar; user Subject: Re: Re: Spark streaming - textFileStream/fileStream - Get file name I think it might be useful in Spark Streaming's file input stream, but not sure is it useful in SparkContext#textFile, since we specify the file by our own, so why we still need to know the file name. I will open up a JIRA to mention about this feature. Thanks Jerry 2015-04-29 10:49 GMT+08:00 Vadim Bichutskiy vadim.bichuts...@gmail.com: I was wondering about the same thing. Vadim ᐧ On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com bit1...@163.com wrote: Looks to me that the same thing also applies to the SparkContext.textFile or SparkContext.wholeTextFile, there is no way in RDD to figure out the file information where the data in RDD is from bit1...@163.com From: Saisai Shao Date: 2015-04-29 10:10 To: lokeshkumar CC: spark users Subject: Re: Spark streaming - textFileStream/fileStream - Get file name I think currently there's no API in Spark Streaming you can use to get the file names for file input streams. Actually it is not trivial to support this, may be you could file a JIRA with wishes you want the community to support, so anyone who is interested can take a crack on this. Thanks Jerry 2015-04-29 0:13 GMT+08:00 lokeshkumar lok...@dataken.net: Hi Forum, Using spark streaming and listening to the files in HDFS using textFileStream/fileStream methods, how do we get the fileNames which are read by these methods? I used textFileStream which has file contents in JavaDStream and I got no success with fileStream as it is throwing me a compilation error with spark version 1.3.1. Can someone please tell me if we have an API function or any other way to get the file names that these streaming methods read? Thanks Lokesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark streaming - textFileStream/fileStream - Get file name
Hi Forum, Using spark streaming and listening to the files in HDFS using textFileStream/fileStream methods, how do we get the fileNames which are read by these methods? I used textFileStream which has file contents in JavaDStream and I got no success with fileStream as it is throwing me a compilation error with spark version 1.3.1. Can someone please tell me if we have an API function or any other way to get the file names that these streaming methods read? Thanks Lokesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org