gt;>
>> Hi,
>> I am reading files using textFileStream, performing some action onto
>> it and then saving it to HDFS using saveAsTextFile.
>> But whenever there is no file to read, Spark will write and empty RDD(
>> [] ) to HDFS.
>> So, how to handle the empty RDD.
some action onto
> it and then saving it to HDFS using saveAsTextFile.
> But whenever there is no file to read, Spark will write and empty RDD(
> [] ) to HDFS.
> So, how to handle the empty RDD.
>
> I checked rdd.isEmpty() and rdd.count>0
Hi,
I am reading files using textFileStream, performing some action onto
it and then saving it to HDFS using saveAsTextFile.
But whenever there is no file to read, Spark will write and empty RDD(
[] ) to HDFS.
So, how to handle the empty RDD.
I checked rdd.isEmpty() and rdd.count>0, but b
in your code, and it is required.
>
> On Thu, Apr 7, 2016 at 5:52 AM, Tenghuan He <tenghua...@gmail.com> wrote:
> > Hi all,
> >
> > I want to create an empty rdd and partition it
> >
> > val buffer: RDD[(K, (V, Int))] = base.context.emptyRDD[(K, (V,
> >
It means pretty much what it says. Your code does not have runtime
class info about K at this point in your code, and it is required.
On Thu, Apr 7, 2016 at 5:52 AM, Tenghuan He <tenghua...@gmail.com> wrote:
> Hi all,
>
> I want to create an empty rdd and partition it
>
> va
Hi all,
I want to create an empty rdd and partition it
val buffer: RDD[(K, (V, Int))] = base.context.emptyRDD[(K, (V,
Int))].partitionBy(new HashPartitioner(5))
but got Error: No ClassTag available for K
scala needs at runtime to have information about K , but how to solve this?
Thanks
TY")} else
> {rdd.collect().foreach(event => println(event.getRepo.getName + " " +
> event.getId))}
> })
>
> ctx.start()
> ctx.awaitTermination()
>
> Thanks in advance!
>
>
>
> --
> View this message in context:
> http://apa
lse
{rdd.collect().foreach(event => println(event.getRepo.getName + " " +
event.getId))}
})
ctx.start()
ctx.awaitTermination()
Thanks in advance!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Travis-CI-and-GitHub-custom-recei
are used to
verify the schema:
https://github.com/apache/spark/blob/branch-1.3/python/pyspark/sql/context.py#L299
Before I attempt to extend the Scala code to handle an empty RDD or
provide an empty DataFrame that can be registered, I was wondering what
people recommend in this case. Perhaps
to
verify the schema:
https://github.com/apache/spark/blob/branch-1.3/python/pyspark/sql/context.py#L299
Before I attempt to extend the Scala code to handle an empty RDD or provide
an empty DataFrame that can be registered, I was wondering what people
recommend in this case. Perhaps there's
It worked Zhou.
On Mon, Jul 6, 2015 at 10:43 PM, Wei Zhou zhweisop...@gmail.com wrote:
I userd val output: RDD[(DetailInputRecord, VISummary)] =
sc.emptyRDD[(DetailInputRecord,
VISummary)] to create empty RDD before. Give it a try, it might work for
you too.
2015-07-06 14:11 GMT-07:00 ÐΞ
This should work
val output: RDD[(DetailInputRecord, VISummary)] =
sc.paralellize(Seq.empty[(DetailInputRecord, VISummary)])
On Mon, Jul 6, 2015 at 5:11 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
I need to return an empty RDD of type
val output: RDD[(DetailInputRecord, VISummary
I need to return an empty RDD of type
val output: RDD[(DetailInputRecord, VISummary)]
This does not work
val output: RDD[(DetailInputRecord, VISummary)] = new RDD()
as RDD is abstract class.
How do i create empty RDD ?
--
Deepak
I userd val output: RDD[(DetailInputRecord, VISummary)] =
sc.emptyRDD[(DetailInputRecord,
VISummary)] to create empty RDD before. Give it a try, it might work for
you too.
2015-07-06 14:11 GMT-07:00 ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com:
I need to return an empty RDD of type
val output: RDD
When I call *transform* or *foreachRDD *on* DStream*, I keep getting an
error that I have an empty RDD, which make sense since my batch interval
maybe smaller than the rate at which new data are coming in. How to guard
against it?
Thanks,
Vadim
ᐧ
Aah yes. The jsonRDD method needs to walk through the whole RDD to
understand the schema, and does not work if there is not data in it. Making
sure there is no data in it using take(1) should work.
TD
Thanks TD!
On Apr 8, 2015, at 9:36 PM, Tathagata Das t...@databricks.com wrote:
Aah yes. The jsonRDD method needs to walk through the whole RDD to understand
the schema, and does not work if there is not data in it. Making sure there
is no data in it using take(1) should work.
TD
...@gmail.com
wrote:
When I call *transform* or *foreachRDD *on* DStream*, I keep getting an
error that I have an empty RDD, which make sense since my batch interval
maybe smaller than the rate at which new data are coming in. How to guard
against it?
Thanks,
Vadim
ᐧ
12, 2015 at 9:50 PM, Xuelin Cao xuelincao2...@gmail.com
wrote:
Hi,
I'd like to create a transform function, that convert RDD[String] to
RDD[Int]
Occasionally, the input RDD could be an empty RDD. I just want to
directly create an empty RDD[Int] if the input RDD is empty. And, I
a transform function, that convert RDD[String] to
RDD[Int]
Occasionally, the input RDD could be an empty RDD. I just want to
directly create an empty RDD[Int] if the input RDD is empty. And, I don't
want to return None as the result.
Is there an easy way to do that?
Streaming empty RDD issue
Hi Experts
I am using Spark Streaming to integrate Kafka for real time data processing.
I am facing some issues related to Spark Streaming So I want to know how can we
detect
1) Our connection has been lost
2) Our receiver is down
3) Spark Streaming has no new messages
these issues?
I will be glad to hear from you and will be thankful to you.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-empty-RDD-issue-tp20329.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
and will be thankful to you.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-empty-RDD-issue-tp20329.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
I think you could use `repartition` to make sure there would be no empty
partitions.
You could also try `coalesce` to combine partitions , but it can't make sure
there are no more empty partitions.
Best Regards,
Yi Tian
tianyi.asiai...@gmail.com
On Oct 18, 2014, at 20:30,
24 matches
Mail list logo