Re: [xwiki-devs] OfficeImporter API changes - Take 2

Thomas Mortagne Mon, 26 Oct 2009 02:39:18 -0700

On Mon, Oct 26, 2009 at 09:00, Asiri Rathnayake
<[email protected]> wrote:
> Hi,
>
>> OfficeImporter::officeToXHTML(byte[] officeFileData, DocumentName
>> > referenceDocument, boolean filterStyles):XHTMLOfficeDocument
>>
>> Can you explain why you replaced InputStream with byte[]? Arrays require
>> contiguous memory locations and loading large arrays into memory is bad
>> IMO.
>>
>
> Yes, you are correct. I still have some doubts about this:
>
> 1. OpenOffice requires the input to be on a disk file -- so we need to write
> the stream / byte [] into the disk. Having a stream instead of a byte[] is
> good in this case because we won't be loading the whole file into memory at
> once.
>
> 2. However OpenOffice will load the whole file into memory at once when
> performing the conversion. Still, having a stream is kind of better because
> we are not consuming any additional memory.
>
> 3. Currently file upload plugin has no method to get a stream of input. So
> it's returning a byte[]. Having this byte[] converted into a stream is kind
> of useless.
>
> 4. However, having an API based on streams is better because in future we
> might anyway upgrade the fileupload plugin to return a stream.
>
> 5. Even after the conversion I need to load the results into memory (byte[])
> so that I can attach them into wiki pages.
>
> 6. But we might change the DAB api in future so that attachments can be
> saved in an streaming fashion (this might be hard).
>
> I might be wrong here, but I don't think having streams oriented API for
> officeimporter will bring any performance enhancement in near future. All it
> will add is some glue code to convert between streams / byte[] so that it
> can work with rest of XE infrastructure (fileupload plugin, DAB, attachment
> saving).
>
> If we base our decision solely on "having a streamable API is always wise",


+1 for "having a streamable API is always wise" :)

> I'd also go with the streams based API.
>
> I'd like to stay away from this decision and let you all decide :)
>
> Thanks.
>
> - Asiri
>
>
>>
>> Thanks,
>> Marius
>>
>> > OfficeImporter::xhtmlToXDOM(XHTMLOfficeDocument
>> > xhtmlOfficeDocument):XDOMOfficeDocument
>> > OfficeImporter::officeToXDOM(byte[] officeFileData, DocumentName
>> > referenceDocument, boolean filterStyles):XDOMOfficeDocument
>> > OfficeImporter::buildPresentation(byte[]
>> officeFileData):XDOMOfficeDocument
>> > OfficeImporter::splitImport(XDOMOfficeDocument xdomOfficeDocument, int[]
>> > headingLevelsToSplit, NamingCriterion namingCriterion, DocumentName
>> > baseDocumentName):Map<TargetPageDescriptor, XDOMOfficeDocument>
>> > </code>
>> >
>> > As you can see, these methods are more granular and the responsibilities
>> are
>> > well defined. Customizing the import process can be done from the client
>> > code. For an example:
>> >
>> > 1. Make the initial import from office to XHTMLOfficeDocument -
>> > OfficeImporter::officeToXHTML()
>> >
>> > 2. Perform customizations on the XHTMLOfficeDocument - w3c DOM
>> > manipulations.
>> >
>> > 3. Import the XHTMLOfficeDocument into XDOMOfficeDocument -
>> > OfficeImporter::xhtmlToXDOM()
>> >
>> > 4. Perform customizations on the XDOMOfficeDocument (XDOM) - XDOM
>> > manipulations.
>> >
>> > 5. Split the XDOMOfficeDocument into multiple XDOMOfficeDocument
>> instances -
>> > OfficeImporter::splitImport()
>> >
>> > 6. Perform customizations on these child XDOMOfficeDocument instances -
>> XDOM
>> > manipulations.
>> >
>> > 7. Render the XDOMOfficeDocument instances & save them into wiki pages -
>> > XWiki rendering operations.
>> >
>> > I think this interface will make it easy to extend & maintain
>> officeimporter
>> > functionality in the future.
>> >
>> > Along with this, I would also like to refactor the xwiki-refactoring
>> module
>> > a bit to get rid of string based document names from it.
>> >
>> >
>> > This whole refactoring operation would take approximately one day to
>> > complete. And since this operation is not adding any new features, I
>> think
>> > it can be committed on both trunk and 2.0 branch.
>> >
>> > Here's my +1 to all of above.
>> >
>> > Thanks.
>> >
>> > - Asiri
>> > _______________________________________________
>> > devs mailing list
>> > [email protected]
>> > http://lists.xwiki.org/mailman/listinfo/devs
>> _______________________________________________
>> devs mailing list
>> [email protected]
>> http://lists.xwiki.org/mailman/listinfo/devs
>>
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs
>



-- 
Thomas Mortagne
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Re: [xwiki-devs] OfficeImporter API changes - Take 2

Reply via email to