Hi Bolke, I would argue that Spark is not the right level of abstraction of doing this. I would create a wrapper around the particular filesystem: http://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html Therefore you can write a wrapper around the LocalFileSystem if data will be written to local disk, DistributedFileSystem when written to HDFS, and also many object stores implements this interface. My 2¢
Cheers, Fokko Op ma 15 okt. 2018 om 18:58 schreef Bolke de Bruin <bdbr...@gmail.com>: > Hi, > > Apologies upfront if this should have gone to user@ but it seems a > developer question so here goes. > > We are trying to improve a listener to track lineage across our platform. > This requires tracking where data comes from and where it goes to. E.g. > > sc.setLogLevel("INFO"); > val data = sc.textFile("hdfs://migration/staffingsec/Mydata.gz") > data.saveAsTextFile ("hdfs://datalab/user/xxx”); > > In this case we would like to know that Spark picked up “Mydata.gz” and > wrote it to “xxx”. Of course more complex examples are possible. > > In the particular case of the above Spark (2.3.2) does not seem trigger > any events, or at least not that we know of that give us the relevant > information. > > Is that a correct assessment? What can we do to get that information > without knowing the code upfront? Should we provide a patch? > > Thanks > Bolke > > Verstuurd vanaf mijn iPad > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >