Hi Bolke,

I would argue that Spark is not the right level of abstraction of doing
this. I would create a wrapper around the particular filesystem:
http://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html
Therefore you can write a wrapper around the LocalFileSystem if data will
be written to local disk, DistributedFileSystem when written to HDFS, and
also many object stores implements this interface. My 2¢

Cheers, Fokko

Op ma 15 okt. 2018 om 18:58 schreef Bolke de Bruin <bdbr...@gmail.com>:

> Hi,
>
> Apologies upfront if this should have gone to user@ but it seems a
> developer question so here goes.
>
> We are trying to improve a listener to track lineage across our platform.
> This requires tracking where data comes from and where it goes to. E.g.
>
> sc.setLogLevel("INFO");
> val data = sc.textFile("hdfs://migration/staffingsec/Mydata.gz")
> data.saveAsTextFile ("hdfs://datalab/user/xxx”);
>
> In this case we would like to know that Spark picked up “Mydata.gz” and
> wrote it to “xxx”. Of course more complex examples are possible.
>
> In the particular case of the above Spark (2.3.2) does not seem trigger
> any events, or at least not that we know of that give us the relevant
> information.
>
> Is that a correct assessment? What can we do to get that information
> without knowing the code upfront? Should we provide a patch?
>
> Thanks
> Bolke
>
> Verstuurd vanaf mijn iPad
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to