I think there are different levels to think about which are interwoven

a) which use cases do we support - parsing, text extraction, merging, form 
filling, viewing, creation …. - do we need more? can we drop some?
b) do we have a good architecture to support these use cases
c) how do we organize the major parts - I think there is already a feeling that 
pdfbox should be modularized in one way or the other
d) which dependencies do we have and where (might belong to b) - e.g. is it a 
good idea that PDDocument needs awt? So where are the boundaries from byte/file 
level to COS to PD model to app/tools/utilities …
e) which PDF functionality is missing e.g. do we need to have a better support 
for different PDF versions
f) efficiency, memory consumption e.g. do we need something like lazy loading
g) as Thomas wrote type safety, generics …. - maybe better object orientation 
e.g. today some parsing is done in the parsers, some is done in the COS objects 
(COSString)
…….

which is the API we agree to keep stable. Is it COS… , PD …..

Thinking about these 'levels' doesn't mean that we do have to address all of 
these immediately (or at all) but it will help to set the expectations. 


My initial thoughts are

2.0
o get the API levels right byte/token -> COS -> PD -> Utils/awt/.…. -> apps 
(Debugger, Reader… - will we keep all of them?)
o "guarantee" PD level for 2.x -> that's our API which means we can freely 
change everything above and below in the 2.x branch. Document that!
o type safety, collections  ….  on PD level first

2.1
o improved parser
o improved object model below PD e.g. decide if parsing from tokens to COS is 
done in parser or COS object but not mixed
o more type safety
…..

2.2
o improved writer
o new "incremental" writer/stamper
o handling of non WinAnsi


BR
Maruan

Am 18.04.2013 um 23:04 schrieb Thomas Chojecki <i...@rayman2200.de>:

> Hi,
> the idea with the branch sounds good but before doing it. As andreas say, we 
> need to clarifi the changes first. A branch is extra work to keep it up to 
> date with the trunk. If we refactor the pdfbox, it will be very hard to merge 
> patches between this versions.
> 
> Back to topic.
> - I think we should try to focus on type safety first. More generic 
> collections and code cleanup.
> - The other thing is the COSWriter. Writting recursive trough the document 
> sounds good, but it make it hard updating existing documents incremental. A 
> better way is a linked list like object pool. Iterating through the list and 
> writting the objects is much easier than running recurisive with 
> cross-references and additional checks.
> - a improved pdf parsing mechanic for broken documents.
> 
> This is a open source lib and with developer that work only at his free time, 
> this can take some time.
> 
> On the other hand we have some bugs. Implementing new features won't fix them.
> 
> I have mixed feelings and no idea what is the best for the pdfbox.
> 
> Looking at some bigger projects like eclipse 3.x to 4.x which in my opinion, 
> mess up the ide. Unfixed bugs, more new bugs and a rewritten lazy framework.
> 
> I don't know, but when we start i think the first point will be a good entry 
> with the "type safety".
> 
> Best regards
> Thomas
> 
> 
> Am 18.04.2013 22:15, schrieb Maruan Sahyoun:
>> I'd think that we should start scoping out 2.0 - what will be covered under 
>> that topic. In addition I would see us doing additional bug fix releases and 
>> minor enhancements prior to releasing 2.0. My preference would be to branch 
>> out 2.0 and keep trunk for working on 1.x as this would be clearer but maybe 
>> we should postpone that discussion until we have a better understanding what 
>> 2.0 means.
>> 
>> Maruan Sahyoun
>> 
>> 
>> Am 18.04.2013 um 21:11 schrieb Andreas Lehmkuehler <andr...@lehmi.de>:
>> 
>>> Hi,
>>> 
>>> what is our next target after releasing 1.8.0 and 1.8.1?
>>> 
>>> We already started some discussions about that topic, but I'd like to have
>>> clarification. Is it time to go for a 2.0 version? If we agree to that goal,
>>> how should we proceed? Should we branch or simply use the trunk?
>>> 
>>> I'd prefer to continue using the trunk. We are still able to release
>>> bugfix versions using the 1.8-branch. Even a new 1.9 feature release
>>> should be possible by branching the 1.8-branch.
>>> 
>>> WDYT?
>>> 
>>> BR
>>> Andreas Lehmkühler
>> 
> 

Reply via email to