Lars Strojny wrote: > Hi Greg, > > Am Samstag, den 29.03.2008, 17:58 -0500 schrieb Greg Beaver: > [...] >> If one uses file_put_contents('/path/to/this/file', 'hi') and >> '/path/to/this' does not exist, there is an error. The same is true of >> fopen, regardless of mode. mkdir() even fails unless the recursive >> parameter is explicitly specified. I do not think that this behavior >> should be different in ext/phar, especially because all that is needed >> is to create 1 directory (full/path/to/) and add the file. > > I mostly agree. Except that the whole argument chain applies to the > current API. You mentioned mkdir("foo/bar/baz") triggering an error > without the recursive argument, but Phar['foo/bar/baz'] does not. > > [... Performance concerns ...] >> $phar['long/path/to/subdir/file'] = 'hi'; >> >> compared to: >> >> $phar['long']['path']['to']['subdir']['file'] = 'hi'; >> >> is significantly slower. Each [] results in a call to offsetGet and >> instantiation of a new PharFileInfo object plus the final offsetSet call. > > Could you estimate how hard that performance hit would be?
We're talking about a similar difference between iterating over an array and using an ArrayIterator, which is to say about 20 times slower. This is due to the many extra opcodes for each offset retrieval, which then calls an expensive method call to offsetGet(), creates the directory inside the phar archive, writes to disk, and continues. Because we are writing to disk, without a not-at-all-obvious startBuffering() call, with large archives, the performance difference could be 100 times slower or more. In addition, although Phar does support the idea of thinking of archives as multi-dimensional arrays, this is an abstraction not supported by the actual archive file formats. For other examples, take a look at at ext/zip. There is no support for opendir() in the stream wrapper of ext/zip because it requires the kind of path grepping that pecl/phar does. Internally, no archive format stores paths the way directories are stored in a typical file system. In other words, typically a directory is a file that contains a list of other files or directories. This allows very quick navigation on a random access disk, but causes extreme bloat inside an archive. Files are stored as full paths in what is the equivalent of a single-dimensional array. In other words, the truth of the matter is that phar's ArrayAccess API is a 100% accurate reflection of the true structure of the actual archives we are accessing, and the iteration supported is a convenience for simulating regular file systems. The API you describe actually layers an artificial complexity over this design, and as a result requires much more code complexity. Let's also remember we're talking about phar creation, which is going to be far less common than phar usage. To use a phar, you don't even need to know what phar *is* let alone what the API is. One just includes the phar archive, or adds it to include_path with a phar:// stream wrapper path. I've spent a significant amount of time actually implementing the new read-only implementation of offsetGet() in the past day, and although I think this might be possible I am running into a weird segfault in SPL on shutdown caused by a double destruction of the object. In addition, the patch is not even close to complete, as it does not support the addition of ArrayAccess to PharFileInfo, which would be required in order to implement what your RFC describes. The complexity of the patch is reaching the hundreds of lines and will be in the thousands of lines, all to add syntactic sugar that is far more inefficient than the current syntax - is this really a necessary feature? As Elizabeth points out, most people are going to build their iterator by creating a directory structure and then add the entire directory all at once. Use of ArrayAccess to write to the archive would be done for one or two files at most. When reading from a phar, there is no real incentive to use relative paths outside of scanning a directory for files (to locate drivers or plugins, for instance), and iterating over the entire archive is usually reserved for searching for a particular file. The ability to glob through a phar archive would of course be the most ideal solution for that approach (perhaps globiterator already works? I haven't tried it). What I would like to do, however, is to rethink offsetSet(). I think that we should introduce a property of PharFileInfo called "content" that is read/write, and can be used to perform the equivalent of file_get_contents()/file_put_contents(). This will allow a consistency that is much more intuitive. offsetSet() could then simply be removed, or perhaps be refactored as a way to pass in a PharFileInfo and clone the information. > Yes, this shouldn't be allowed while > $phar['foo/bar/baz'] = new DirectoryIterator(); could be allowed. I will simply add makeEmptyDir() a la zip, although it is too bad ext/zip never thought to go with mkdir(). >> Phar::compressArchive() /* Steph hated this when I originally proposed >> it, perhaps she's changed her mind? */ > > I like it. And much better, it does not contain the compression > format :-) One problem with this suggestion is that phar renames the archive by default when compressing. I would rather we combine compression and format conversion into a single convert() method that accepts constants. $phar = new Phar('blah.phar'); $phar = $phar->convert(Phar::GZ); // compress with gzip $phar = $phar->convert(Phar::TAR | PHAR::GZ); // convert to .phar.tar.gz $phar = $phar->convert(Phar::NONE | PHAR::DATA); // convert to non-executable .tar, returns PharData object $phar = $phar->convert(Phar::EXE); // convert back to .phar.tar This would also solve a subtle but important problem with the current conversion API where intermediary files are lying around. >> Phar::compressAllFiles() > > Wouldn't compressFiles() be fine? Probably, especially if we remove the compress() method. >> Lars, I also think it might help to understand my reaction better that >> we've been publicly discussing the API of compression and other >> phar-related issues on pecl-dev for a while, with no feedback. > > I would not care so much for phar, if it's inclusion in core has not > been targeted. > >> To suddenly see an RFC with major changes without having even been notified >> was annoying, I hope that in the future you will at least contact the >> leads of the project you are proposing changes for so that minor >> disagreements can be worked out in private. > > On http://pecl.php.net/package/phar Marcus is listed as one of the two > project leads. I've talked to him *before* writing the RFC on friday. So > this accusation fizzles out. I sort of understand your irritation, but > from my point of view improvement takes place constantly. In the time > Phar was discussed on pecl-dev I was not subscribed to it, now I am. So > the next time I will come up earlier most presumably. My point is you should contact the leads (plural) of a project. I've been breaking my back implementing changes and features requested as far back as last May, and finally reached a fully functional solution a few weeks ago. Your RFC would completely redo a large portion of the API. I think it is a common courtesy to be consulted prior to publishing the RFC for the simple reason that I am the one writing the majority of the code and implementing these changes. >> Do you want to take a crack at a patch for any of these API features >> once they are worked out (buildFromDirectory should be an easy one, if >> you would like to start there)? > > Yes, I'm willing to help with it to a degree, as I said in a previous > mail. How soon can we expect a patch for this? Steph also expressed interest, and we are on a rather tight time schedule to get this to beta, which will mean a feature freeze (no new crap, just bug fixes). Let me also be clear: the largest priority right now is on making pecl/phar work with xdebug, APC and on increasing stability, test coverage, and ensuring we work out all OS X issues since none of the devs have an OS X platform to test phar on. Greg -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php