On Wed, 1 Sep 2010, Andrzej Bialecki wrote:
I was thinking recursive could mean different things. For zip files, tar
files etc, it would probably just mean root directory vs descend into
all directories.
There are no directories in these formats - it's just a flat namespace
that just happens to use the filesystem conventions. Java APIs for these
containers also provide only simple iterators. So I'm not sure if
there's any benefit to this distinction here... maybe provide a
FilenameFilter to control what path names to process?
OK, looks like a directory descent on/off isn't a great fit.
I guess we'll want to provide two ways to filter, one by filename (which
is normally available), and one by mime type (which is sometimes
available). Or I guess a callback of "do you want this one?" where we pass
in all the information we have to hand. Any thoughts?
On the other hand I see a benefit in having an option to automatically
descend into embedded archives.
So we'd have some sort of filtering, and the descend yes/no option? For a
zip, the former exposes all files from all "directories", and the latter
will cause it to descend into both embeded zips, and embeded other
containers like .doc? For a .docx, the former exposes all embeded files
(but none of the ooxml file format stuff), and the latter controls if
embeded other office documents are processed?
For OLE2, it would mean checking embeded documents of
embeded documents (normally but not always by means of descending into
child directories). Maybe there's a clearer name for this sort of thing?
OLE2 is nothing special, it's the same with other archive types, you can
always have embedded archives within archives.
The OLE2 files aren't always so nice. Some store embeded files as
directory entries, some stash them away in records...
Nick