Re: Container Extractor?

Nick Burch Thu, 02 Sep 2010 03:05:49 -0700

On Wed, 1 Sep 2010, Andrzej Bialecki wrote:

I was thinking recursive could mean different things. For zip files, tar
files etc, it would probably just mean root directory vs descend into
all directories.
There are no directories in these formats - it's just a flat namespacethat just happens to use the filesystem conventions. Java APIs for thesecontainers also provide only simple iterators. So I'm not sure ifthere's any benefit to this distinction here... maybe provide aFilenameFilter to control what path names to process?


OK, looks like a directory descent on/off isn't a great fit.

I guess we'll want to provide two ways to filter, one by filename (whichis normally available), and one by mime type (which is sometimesavailable). Or I guess a callback of "do you want this one?" where we passin all the information we have to hand. Any thoughts?

On the other hand I see a benefit in having an option to automaticallydescend into embedded archives.

So we'd have some sort of filtering, and the descend yes/no option? For azip, the former exposes all files from all "directories", and the latterwill cause it to descend into both embeded zips, and embeded othercontainers like .doc? For a .docx, the former exposes all embeded files(but none of the ooxml file format stuff), and the latter controls ifembeded other office documents are processed?

For OLE2, it would mean checking embeded documents of
embeded documents (normally but not always by means of descending into
child directories). Maybe there's a clearer name for this sort of thing?
OLE2 is nothing special, it's the same with other archive types, you canalways have embedded archives within archives.

The OLE2 files aren't always so nice. Some store embeded files asdirectory entries, some stash them away in records...


Nick

Re: Container Extractor?

Reply via email to