Hi Thierry,

I think this sounds quite interesting. Certainly, a better "document"
story (which includes full-text indexing and a strategy to avoid ZODB
bloat, e.g. blobfile) is pretty high on my wishlist for 3.5 (and
limi's as well, fwiw).

I would like to see a proposal that is somwhat less AT centric,
though. It may be wishful to think that we can achieve this, but
ideally we'd decouple portal_transform entirely, replacing it with a
lighter framework based on Zope 3 adapters and utilities (a transform
is a utility, adapters take care of the actual extraction of data to
transform and consumption of the transformed text). This should also
allow some async option (register a consumer for the transform that is
called when the transform is complete).

At this point, we could extend ATFile relatively easily to use this. I
don't think we'd want a new content type, but rather to extend ATFile
as necessary.

I think BLOB storage and transform should be two separate proposals
and two separate implementations.



I'd like to make a proposal that extends Plip #177

We developed a plone component that stores a file with its html preview :
ATFilePreview .

This does the following :

- make the file available for download

- create a html preview of the file

- index the file's content in full text

It has the following advantages :

- it uses mimetypes registry in order to detect mimetypes

- it uses portal transforms in order to create the preview and uses this
preview in order to extract the text that has to be indexed

- it stores both html preview and all subobjects into the object, as
persistant sub-objects

- it's totally generic : obviously it does preview and indexes for
opendocuments, ms documents, pdf, rtf, html, python etc. It may also show
a preview for zip files, video files, audio files or whatever you can
imagine. Let's take the example of a video file : you may decide that all
video that is uploaded will be transcoded to mkv format and streamed in
the page via a java applet that displays the video. You only need to have
a video_to_html transform that will achieve it. The result will be stored
together with the original file and the html preview will be displayed.

- the trunk (it's in collective) stores everything inside the object in
zodb, so it has no dependency and can take place of normal file objects

- there is another version that stores file, html and subobjects in the
filesystem. It currently uses FSS but we'd like to move that to BlobFile
as FSS is a bit too complex for our usecase.

- we don't need all the TING mechanics in order to get the fulltext
indexing : we only need the UnicodeLexicon as far as portal transforms
send unicode results (tested in france ; you can imagine ;-) )

- we already have the transforms for all office files in
AROfficesTransform, for which we are currently doing the integration into

At this time there are 2 new things to consider :

- portal transforms may overload the zope server

- there may be decorators that should be applied to files in order to
handle properly specific extra fields (especially for multimedia files :
metadata etc.)

* Concerning overload of zope server : I think that we should have an
asynchronous portal transform that may run as a separate twisted deamon.
This may live together with portal_transforms and may be called
asynchronous_portal_transform (APT). The only difference with
portal_transforms is that we need to give a callback method to APT in
order to allow it to send the result of the transform after a while.
Therefore if a content type is APT-aware and APT is activated, APT is used
instead of portal_transforms. This allow to move the overload to one or
many dedicated servers for example. We may also take a look at BlueDCS (I
just heard of it but never tried it)

* Concerning the decorators : there should be a kind of
decorators_registry that would allow to add decorators based on mimetypes

What do you think of all these points ?

Best regards,



Framework-Team mailing list

Framework-Team mailing list

Reply via email to