https://issues.apache.org/bugzilla/show_bug.cgi?id=51524

Sergey Vladimirov <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED
         OS/Version|                            |All

--- Comment #2 from Sergey Vladimirov <[email protected]> 2011-07-18 18:53:57 
UTC ---
Antoni,

Rebuilding PAPX table is central point in paragraph processing. It shall not be
disabled in a way like "get me old poi", because old poi based on incorrect
algorithm.

Previous version looked for all PAPX and called them paragraphs. Current
version construct the text, split it into paragraphs and applies PAPX. I.e. in
previous version you missed some text from document in case there were not PAPX
for it. For example, in document by the link you provide 151 parts of text were
missing.

So disabling this feature can be usefull only in one case - you want to work
with PAPX directly. I have no idea why someone will want to do it outside of
POI.

Anyway, personally i'm sad it is done in PapBinTable. It shall be done on
different level, in completely different place. There should be at least two
different APIs: one for low-level files processing, including parsing PAPX
structures, complex file tables, etc., and second - to work with document, with
tables, lists, paragraphs, etc. Rebuilding paragraphs list shall be on the
second level.

But making such change will break API for sure. So we need to wait until 4.0,
where "old" way to work with PAPX will be provided and preserved in "first
level API", as well as DocumentParser with different options on the "second
level API".

Anyway, again, i rewrite this code partially and now it tooks 4 seconds.
Please, let me know if it's acceptable. If possible, please check if data
parsed are correct - errors could be.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to