Re: Merge Request - Temp_ComplexScripts into Trunk

Peter Hancock Thu, 20 Oct 2011 07:32:04 -0700

This is a tough one.

The need for complex script support in FOP is likely high on the wish
list of any global supporter of FOP and it is certainly in the
interest of the project to support. The amount of work that Glenn has
done is considerable: the volume of code, the test coverage and the
obvious depth of domain knowledge.


>From the surface I would have been very much in favor of supporting a
merge in the near future, however I have had the chance to review some
areas of the complex script branch and I have some concerns.
The treatment of Unicode Bidi spans from the creation of the FO Tree
through to the construction of the Area Tree and I would have liked to
have seen the complex scripts solution integrate the Unicode Bidi
Algorithm more directly into the core process:  For example, the
implementation performs a post process on the FO Tree to resolve the
Bidi properties of FONodes relating to text. It would be preferable to
see the construction of the FO Tree embracing this Bidi aspect:
FONodes should be responsible for determining their own bidi state
from the fo node semantics in context to the position in the tree.
Such an implementation would immediately force the maintainer to
consider how a change would effect the Bidi process.

The layout and area tree construction phase interact a little more
directly  with the Bidi process, however most of the Bidi work is
delegated to static methods of a utility class (..layoutmgr.BidiUtil,
also used heavily in the Bidi post-process of the FO Tree) and
consequently require many instanceof expressions because of
differences in Bidi behavior between the Area classes.  Areas should
be responsible for the character re-ordering process.

Having this functionality in FOP is desirable and I would encourage
steps to address these concerns, and those outlined by Chris and
Vincent.

Peter

On Thu, Oct 20, 2011 at 2:53 PM, Vincent Hennebert <vhenneb...@gmail.com> wrote:
> The Complex Scripts feature is obviously a great enhancement and we
> would all love to have it implemented in FOP. However, that should not
> come at the expense of maintainability and the implementation of other
> new features.
>
> When I look at the code in the Temp_ComplexScripts branch, I have
> serious concerns regarding the latter two points. I would oppose merging
> the branch back to Trunk until those are resolved.
>
> Here are the sizes of some new files:
> 1075 src/java/org/apache/fop/fonts/GlyphSequence.java
> 1089 src/java/org/apache/fop/fonts/GlyphProcessingState.java
> 1269 
> src/codegen/unicode/java/org/apache/fop/text/bidi/GenerateBidiTestData.java
> 2034 src/java/org/apache/fop/layoutmgr/BidiUtil.java
> 3449 test/java/org/apache/fop/complexscripts/util/TTXFile.java
>
> This latter one contains more than 50 field
> declarations, and the Handler.startElement method alone is more than
> 1800 lines long.
>
> Also, the o.a.f.fonts.truetype.TTFFile class has now grown to
> 5502 lines. That’s 3 times its original size which was already too big.
> I regularly find myself looking at bits of this class, and I would be
> unable to do so on a 5500 line class.
>
> I don’t think it needs to be explained why big classes are undesirable?
>
> Also, most files disable Checkstyle checks, the most important ones
> being line length and white space. Many files have too long lines which
> makes it a pain to read through, having to horizontally scroll all the
> time. We agreed on a certain coding style in the project and it would be
> good if new code could adhere to it.
>
> Speaking of variable names, here is a method picked from
> o.a.f.fonts.GlyphSequence:
>    /**
>     * Merge overlapping and abutting sub-intervals.
>     */
>    private static int[] mergeIntervals ( int[] ia ) {
>        int ni = ia.length;
>        int i, n, nm, is, ie;
>        // count merged sub-intervals
>        for ( i = 0, n = ni, nm = 0, is = ie = -1; i < n; i += 2 ) {
>            int s = ia [ i + 0 ];
>            int e = ia [ i + 1 ];
>            if ( ( ie < 0 ) || ( s > ie ) ) {
>                is = s;
>                ie = e;
>                nm++;
>            } else if ( s >= is ) {
>                if ( e > ie ) {
>                    ie = e;
>                }
>            }
>        }
>        int[] mi = new int [ nm * 2 ];
>        // populate merged sub-intervals
>        for ( i = 0, n = ni, nm = 0, is = ie = -1; i < n; i += 2 ) {
>            int s = ia [ i + 0 ];
>            int e = ia [ i + 1 ];
>            int k = nm * 2;
>            if ( ( ie < 0 ) || ( s > ie ) ) {
>                is = s;
>                ie = e;
>                mi [ k + 0 ] = is;
>                mi [ k + 1 ] = ie;
>                nm++;
>            } else if ( s >= is ) {
>                if ( e > ie ) {
>                    ie = e;
>                }
>                mi [ k - 1 ] = ie;
>            }
>        }
>        return mi;
>    }
>
> Now I fully appreciate that one has to have some knowledge of an area to
> understand code relating to that area, but no level of expertise,
> however high, will help me to quickly understand this code. This is just
> too easy to mistake one variable for another one when they differ by
> only one letter.
>
> Combined, these would make the code very hard to maintain by anyone
> other than the original author, and this is why I’m opposed to merging
> it to Trunk in its current form. When I commit code I do my very best to
> make it as easy to read and understand by other people, and I would
> really appreciate it I could have the same in return.
>
>
> Thanks,
> Vincent
>
>
> On 19/10/11 19:32, Simon Pepping wrote:
>> Over the past ten years computing has pervaded life in all its facets,
>> and spread over the world. As a consequence computing is now used in
>> all languages and all scripts.
>>
>> When I open my devanagari test file in emacs, it just works. When I
>> open it in firefox, it just works. The same when I open it in
>> LibreOffice Writer. I am sure that, if I would open it in *the* *Word*
>> processor, it would just work. When I process the file with FOP, it
>> does not work. With the complex scripts functionality, it works,
>> dependent on the use of supported or otherwise suitable fonts. (That
>> is true for all above applications, but apparently those come
>> configured with my system.)
>>
>> So what does a person do who believes in the XML stack to maintain his
>> documentation, and wants to send his documents in Hindi to his
>> customers? See that XSL-FO with FOP is not a suitable solution for him
>> because Hindi uses a complex script?
>>
>> FOP needs the complex scripts functionality to remain a player in the
>> global playing field.
>>
>> This is for me the single overarching consideration to want this
>> functionality in FOP's trunk code, and in, say, half a year in a
>> release. All other considerations are minor, unless one wants to claim
>> that this code will block FOP's further development and maintenance in
>> the coming years.
>>
>> Of course, not everybody needs this functionality, and there is a fear
>> of increased maintenance overhead. But the question is: For whom do we
>> develop FOP? Also for the large part of the world that uses complex
>> scripts?
>>
>> With the development of the complex scripts functionality, Glenn Adams
>> and his sponsor Basis Technologies have created a new reality, which
>> is not going to go away. If this functionality does not end up in FOP,
>> it will probably live on elsewhere. If the FOP team is fine with that,
>> say no to the merge request, and feel comfortable with a trusted niche
>> application.
>>
>> Simon Pepping
>>
>> On Wed, Oct 19, 2011 at 09:50:24AM +0100, Chris Bowditch wrote:
>>> On 18/10/2011 19:55, Simon Pepping wrote:
>>>> I merged the ComplexScripts branch into trunk. Result:
>>>
>>> Hi Simon,
>>>
>>> As well of the question of how to do the merge there is also the
>>> question should we do the merge? Of course this is a valuable
>>> feature to the community, and Glenn has invested a lot of time in
>>> its development but is it truely production ready? I asked Vincent
>>> to take a look at the branch earlier in the year as it's a feature
>>> we also need, but he had several concerns that have not be
>>> adequately answered. Take a look at comment #30;
>>> https://issues.apache.org/bugzilla/show_bug.cgi?id=49687#c30
>>>
>>> I'm not sure why Vincent describes it as a "brief look" because he
>>> spent several days on it. I also asked Peter to take a look and he
>>> had similar concerns. 2 or 3 letter variable names are a barrier for
>>> any committer wanting to maintain this code and I don't think it is
>>> a sufficient argument to say that a pre-requisite to maintaining
>>> this code is to be a domain expert. I would hope that any
>>> experienced committer with a debugger should be able to solve some
>>> bugs. Obviously certain problems will require domain expertise, but
>>> the variables names are a key barrier to being able to maintain this
>>> code.
>>>
>>> I realise my comments might be a little controversial and I don't
>>> mean any disrespect to Glenn or his work (which is largely
>>> excellent), but we should at least discuss these topics before the
>>> merge is completed.
>

Re: Merge Request - Temp_ComplexScripts into Trunk

Reply via email to