Hi! Quick questions: - which sdk are you using? - is this batch or streaming?
As JB mentioned, TextIO is able to work with compressed files that contain text. Nothing currently handles the double decompression that I believe you're looking for. TextIO for Java is also able to"watch" a directory for new files. If you're able to (outside of your pipeline) decompress your first zip file into a directory that your pipeline is watching, you may be able to use that as work around. Does that sound like a good thing? Finally, if you want to implement a transform that does all your logic, well then that sounds like SplittableDoFn material; and in that case, someone that knows SDF better can give you guidance (or clarify if my suggestions are not correct). Best -P. On Thu, Mar 15, 2018, 8:09 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi > > TextIO supports compressed file. Do you want to read files in text ? > > Can you detail a bit the use case ? > > Thanks > Regards > JB > Le 15 mars 2018, à 18:28, Shirish Jamthe <sjam...@google.com> a écrit: >> >> Hi, >> >> My input is a tar.gz or .zip file which contains thousands of tar.gz >> files and other files. >> I would lile to extract the tar.gz files from the tar. >> >> Is there a transform that can do that? I couldn't find one. >> If not is it in works? Any pointers to start work on it? >> >> thanks >> > -- Got feedback? go/pabloem-feedback