Lars Strojny wrote:
> Hi Greg,
> 
> Am Samstag, den 29.03.2008, 17:58 -0500 schrieb Greg Beaver:
> [...]
>> If one uses file_put_contents('/path/to/this/file', 'hi') and 
>> '/path/to/this' does not exist, there is an error.  The same is true of 
>> fopen, regardless of mode.  mkdir() even fails unless the recursive 
>> parameter is explicitly specified.  I do not think that this behavior 
>> should be different in ext/phar, especially because all that is needed 
>> is to create 1 directory (full/path/to/) and add the file.
> 
> I mostly agree. Except that the whole argument chain applies to the
> current API. You mentioned mkdir("foo/bar/baz") triggering an error
> without the recursive argument, but Phar['foo/bar/baz'] does not.
> 
> [... Performance concerns ...]
>> $phar['long/path/to/subdir/file'] = 'hi';
>>
>> compared to:
>>
>> $phar['long']['path']['to']['subdir']['file'] = 'hi';
>>
>> is significantly slower.  Each [] results in a call to offsetGet and 
>> instantiation of a new PharFileInfo object plus the final offsetSet call.
> 
> Could you estimate how hard that performance hit would be?

We're talking about a similar difference between iterating over an array
and using an ArrayIterator, which is to say about 20 times slower.  This
is due to the many extra opcodes for each offset retrieval, which then
calls an expensive method call to offsetGet(), creates the directory
inside the phar archive, writes to disk, and continues.  Because we are
writing to disk, without a not-at-all-obvious startBuffering() call,
with large archives, the performance difference could be 100 times
slower or more.

In addition, although Phar does support the idea of thinking of archives
as multi-dimensional arrays, this is an abstraction not supported by the
actual archive file formats.  For other examples, take a look at at
ext/zip.  There is no support for opendir() in the stream wrapper of
ext/zip because it requires the kind of path grepping that pecl/phar
does.  Internally, no archive format stores paths the way directories
are stored in a typical file system.  In other words, typically a
directory is a file that contains a list of other files or directories.
 This allows very quick navigation on a random access disk, but causes
extreme bloat inside an archive.  Files are stored as full paths in what
is the equivalent of a single-dimensional array.  In other words, the
truth of the matter is that phar's ArrayAccess API is a 100% accurate
reflection of the true structure of the actual archives we are
accessing, and the iteration supported is a convenience for simulating
regular file systems.  The API you describe actually layers an
artificial complexity over this design, and as a result requires much
more code complexity.

Let's also remember we're talking about phar creation, which is going to
be far less common than phar usage.  To use a phar, you don't even need
to know what phar *is* let alone what the API is.  One just includes the
phar archive, or adds it to include_path with a phar:// stream wrapper path.

I've spent a significant amount of time actually implementing the new
read-only implementation of offsetGet() in the past day, and although I
think this might be possible I am running into a weird segfault in SPL
on shutdown caused by a double destruction of the object.  In addition,
the patch is not even close to complete, as it does not support the
addition of ArrayAccess to PharFileInfo, which would be required in
order to implement what your RFC describes.  The complexity of the patch
is reaching the hundreds of lines and will be in the thousands of lines,
all to add syntactic sugar that is far more inefficient than the current
syntax - is this really a necessary feature?

As Elizabeth points out, most people are going to build their iterator
by creating a directory structure and then add the entire directory all
at once.  Use of ArrayAccess to write to the archive would be done for
one or two files at most.

When reading from a phar, there is no real incentive to use relative
paths outside of scanning a directory for files (to locate drivers or
plugins, for instance), and iterating over the entire archive is usually
reserved for searching for a particular file.  The ability to glob
through a phar archive would of course be the most ideal solution for
that approach (perhaps globiterator already works?  I haven't tried it).

What I would like to do, however, is to rethink offsetSet().  I think
that we should introduce a property of PharFileInfo called "content"
that is read/write, and can be used to perform the equivalent of
file_get_contents()/file_put_contents().  This will allow a consistency
that is much more intuitive.  offsetSet() could then simply be removed,
or perhaps be refactored as a way to pass in a PharFileInfo and clone
the information.

> Yes, this shouldn't be allowed while
> $phar['foo/bar/baz'] = new DirectoryIterator(); could be allowed.

I will simply add makeEmptyDir() a la zip, although it is too bad
ext/zip never thought to go with mkdir().

>> Phar::compressArchive() /* Steph hated this when I originally proposed 
>> it, perhaps she's changed her mind? */
> 
> I like it. And much better, it does not contain the compression
> format :-)

One problem with this suggestion is that phar renames the archive by
default when compressing.  I would rather we combine compression and
format conversion into a single convert() method that accepts constants.

$phar = new Phar('blah.phar');
$phar = $phar->convert(Phar::GZ); // compress with gzip
$phar = $phar->convert(Phar::TAR | PHAR::GZ); // convert to .phar.tar.gz
$phar = $phar->convert(Phar::NONE | PHAR::DATA); // convert to
non-executable .tar, returns PharData object
$phar = $phar->convert(Phar::EXE); // convert back to .phar.tar

This would also solve a subtle but important problem with the current
conversion API where intermediary files are lying around.

>> Phar::compressAllFiles()
> 
> Wouldn't compressFiles() be fine?

Probably, especially if we remove the compress() method.

>> Lars, I also think it might help to understand my reaction better that 
>> we've been publicly discussing the API of compression and other 
>> phar-related issues on pecl-dev for a while, with no feedback.  
> 
> I would not care so much for phar, if it's inclusion in core has not
> been targeted. 
> 
>> To suddenly see an RFC with major changes without having even been notified 
>> was annoying, I hope that in the future you will at least contact the 
>> leads of the project you are proposing changes for so that minor 
>> disagreements can be worked out in private.
> 
> On http://pecl.php.net/package/phar Marcus is listed as one of the two
> project leads. I've talked to him *before* writing the RFC on friday. So
> this accusation fizzles out. I sort of understand your irritation, but
> from my point of view improvement takes place constantly. In the time
> Phar was discussed on pecl-dev I was not subscribed to it, now I am. So
> the next time I will come up earlier most presumably.

My point is you should contact the leads (plural) of a project.  I've
been breaking my back implementing changes and features requested as far
back as last May, and finally reached a fully functional solution a few
weeks ago.  Your RFC would completely redo a large portion of the API.
I think it is a common courtesy to be consulted prior to publishing the
RFC for the simple reason that I am the one writing the majority of the
code and implementing these changes.

>> Do you want to take a crack at a patch for any of these API features 
>> once they are worked out (buildFromDirectory should be an easy one, if 
>> you would like to start there)?
> 
> Yes, I'm willing to help with it to a degree, as I said in a previous
> mail.

How soon can we expect a patch for this?  Steph also expressed interest,
and we are on a rather tight time schedule to get this to beta, which
will mean a feature freeze (no new crap, just bug fixes).

Let me also be clear: the largest priority right now is on making
pecl/phar work with xdebug, APC and on increasing stability, test
coverage, and ensuring we work out all OS X issues since none of the
devs have an OS X platform to test phar on.

Greg

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to