On Wed, 1 Sep 2010, Andrzej Bialecki wrote:
I was thinking recursive could mean different things. For zip files, tar
files etc, it would probably just mean root directory vs descend into
all directories.

There are no directories in these formats - it's just a flat namespace that just happens to use the filesystem conventions. Java APIs for these containers also provide only simple iterators. So I'm not sure if there's any benefit to this distinction here... maybe provide a FilenameFilter to control what path names to process?

OK, looks like a directory descent on/off isn't a great fit.

I guess we'll want to provide two ways to filter, one by filename (which is normally available), and one by mime type (which is sometimes available). Or I guess a callback of "do you want this one?" where we pass in all the information we have to hand. Any thoughts?

On the other hand I see a benefit in having an option to automatically descend into embedded archives.

So we'd have some sort of filtering, and the descend yes/no option? For a zip, the former exposes all files from all "directories", and the latter will cause it to descend into both embeded zips, and embeded other containers like .doc? For a .docx, the former exposes all embeded files (but none of the ooxml file format stuff), and the latter controls if embeded other office documents are processed?

For OLE2, it would mean checking embeded documents of
embeded documents (normally but not always by means of descending into
child directories). Maybe there's a clearer name for this sort of thing?

OLE2 is nothing special, it's the same with other archive types, you can always have embedded archives within archives.

The OLE2 files aren't always so nice. Some store embeded files as directory entries, some stash them away in records...

Nick

Reply via email to