Hi,
can you please take a screen shot and show us the number of records that
the streaming programme is reading from the source? If I am not mistaken it
should be able to write out records to the output location every 5 mins.

Also, it may be of help to check whether  you have permissions to write to
the output location?


Thanks and Regards,
Gourav Sengupta

On Fri, Apr 22, 2022 at 3:57 PM hsy...@gmail.com <hsy...@gmail.com> wrote:

> Hello all,
>
> I’m just trying to build a pipeline reading data from a streaming source
> and write to orc file. But I don’t see any file that is written to the
> file system nor any exceptions
>
> Here is an example
>
> val df = spark.readStream.format(“...")
>       .option(
>         “Topic",
>         "Some topic"
>       )
>       .load()
>     val q = df.writeStream.format("orc").option("path",
> "gs://testdata/raw")
>       .option("checkpointLocation",
> "gs://testdata/raw_chk").trigger(Trigger.ProcessingTime(5,
> TimeUnit.SECONDS)).start
>     q.awaitTermination(1200000)
>     q.stop()
>
>
> I couldn’t find any file until 1200 seconds are over
> Does it mean all the data is cached in memory. If I keep the pipeline
> running I see no file would be flushed in the file system.
>
> How do I control how often spark streaming write to disk?
>
> Thanks!
>

Reply via email to