Re: Review: Bo Yang -- libdom Core

John-Mark Bell Thu, 23 Jul 2009 07:18:35 -0700

On Thu, 2009-07-23 at 21:13 +0800, Bo Yang wrote:
> Thanks for your review, jmb.
> 
> >
> > In general, this is ok. There are quite a lot of typos in this code --
> > mostly in comments, though some in code. I've highlighted the code ones,
> > but it'd be good if the ones in comments are fixed, too.
> 
> Forgive me for my poor English and careless, I will check them out.
> 
> > seem to be a fair few functions that are lacking API documentation. This
> > needs fixing before merge.
> 
> Yes, I will add comments for all API.


Thanks.

> >
> > There's a bunch of reference counting stuff that's not quite right in
> > this diff, but as you're working on this, I've not commented on it.
> 
> Now, most of the ref count stuff should be fixed by the leakage test phase.
> 
> >> Index: test/run-test.sh
> >> ===================================================================
> >> --- /dev/null 2009-04-16 19:17:07.000000000 +0100
> >> +++ test/run-test.sh  2009-07-15 02:08:31.000000000 +0100
> >> @@ -0,0 +1,130 @@
> >> +#!/bin/bash
> >
> > Bash is not available on all supported build hosts. Thus, this script is
> > not portable.
> >
> > Compilation of test binaries should be handled by the Makefile.
> >
> > Running the test binaries should be performed by a testrunner written in
> > Perl. There's one in the core buildsystem that may be suitable.
> 
> I want to migrate the test framework to Makefile and testrunner after
> HTML module, because the now test framework can work and migrating it
> will cost much time. How do you think about this?

As long as there is a plan, then I don't mind. What I don't want to see
happen is that the current system gets left in place because noone gets
around to replacing it.

> >> Index: test/README
> >> ===================================================================
> >> --- /dev/null 2009-04-16 19:17:07.000000000 +0100
> >> +++ test/README       2009-07-15 02:08:31.000000000 +0100
> >> @@ -0,0 +1,2 @@
> >> +Note the lib/ directory contain the old test related files, although they
> >> +are not workable, I leave them here for later use if there is a chance.
> >
> > Is there any need for this. Also, why preserve the lib directory --
> > version control does that for you.
> 
> Removed.

Ok.

> >> +     DERIVATION_RESTRICTION  = 0x00000001,
> >> +     DERIVATION_EXTENSION    = 0x00000002,
> >> +     DERIVATION_UNION        = 0x00000004,
> >> +     DERIVATION_LIST         = 0x00000008
> >> +} dom_type_info_derivation_method;
> >
> > Is this a bitfield?
> 
> Yes, it is, see
> http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#TypeInfo.
> 
> >> +dom_exception resource_mgr_create_hashtable(struct resource_mgr *res,
> >> +             size_t chains, hash_func f, struct hash_table **ht);
> >> +
> >> +#endif
> >
> > I'm not sure I understand the need for a resource manager. Can you
> > explain?
> >
> 
> Generally, a resource manager is used to alloc/dealloc memory and
> intern strings. In the now implementation, the dom_document itself is
> both a Document and a resource manager. I think it is better to
> separate the resource management function out of the Document and make
> it a single object as resource_manager.

Well, all allocation (aside from DocumentType nodes) occurs in the
context of a document. Thus, by definition, the Document has to manage
resources. It doesn't really make sense to me to have separate resource
management objects floating around.

> >> +#include <libwapcaplet/libwapcaplet.h>
> >> +
> >> +#include "core/node.h"
> >> +#include "core/document_type.h"
> >> +
> >> +#include "utils/utils.h"
> >> +#include "utils/validate.h"
> >> +#include "utils/namespace.h"
> >> +
> >> +#include "bootstrap/implementation.h"
> >> +
> >> +static dom_alloc _alloc;
> >> +static void * _pw;
> >
> > Lose the space after '*'.
> >
> > I'm not comfortable about these being static. Is there any guarantee
> > that they're shared by different implementation instances? Note that
> > DOMImplementations may need language-specific hooks (e.g. for
> > normalisation).
> >
> 
> I am sorry but I did not realize why a single static allocator is not enough.

Is it possible for there to be multiple DOMImplementations? If so, then
a static allocator simply won't work.

> >> +/* Clone a dom_string */
> >> +dom_exception dom_string_clone(dom_alloc alloc, void *pw, struct 
> >> dom_string *str,
> >> +             struct dom_string **ret);
> >
> > Would this be better named as _copy()?
> 
> It is not just copying, it realloc the string with new allocator, so I
> think clone is better.

Right. This needs explaining in the documentation.

> >> +/* Create a DOM string from a buffer
> >> + * We will not alloc any memory in this method because the client provide
> >> + * it. This function call mainly used for create a string from lwc_string 
> >> */
> >> +dom_exception dom_string_create_from_lwcstring(dom_alloc alloc, void *pw,
> >> +             struct lwc_context_s *ctx, struct lwc_string_s *str,
> >> +             struct dom_string **ret);
> >
> > The documentation here is wrong. Also, does this not complicate things
> > hugely for the client? They'd have to know they're dealing with an
> > lwc_string.
> >
> 
> I have changed the comment and move the API to internal.

Ok.

> >> @@ -117,6 +120,12 @@
> >>  dom_exception dom_document_create_string(struct dom_document *doc,
> >>               const uint8_t *data, size_t len, struct dom_string **result);
> >>
> >> +dom_exception dom_document_create_lwcstring(struct dom_document *doc,
> >> +             const uint8_t *data, size_t len, struct lwc_string_s 
> >> **result);
> >
> > What is this for? Why is it in the public API?
> >> +dom_exception dom_document_create_string_from_lwcstring(struct 
> >> dom_document *doc,
> >> +             struct lwc_string_s *str, struct dom_string **result);
> >> +
> >
> > Similarly.
> >
> 
> If the client has a lwc_string * and want to call our API which take a
> dom_string, this function is very helpful for that situation and it is
> a common situation.

That doesn't explain why dom_document_create_lwcstring() is in the
public API.

> >> Index: include/dom/core/implementation.h
> >> ===================================================================
> >> --- include/dom/core/implementation.h (revision 8546)
> >> +++ include/dom/core/implementation.h (working copy)
> >> @@ -30,20 +30,19 @@
> >>  dom_exception dom_implementation_get_feature(
> >>               struct dom_implementation *impl,
> >>               struct dom_string *feature, struct dom_string *version,
> >> -             void **object,
> >> -             dom_alloc alloc, void *pw);
> >> +             void **object);
> >
> > How is this going to create an object if it doesn't know what allocator
> > to use? An implementation may be used by more than one document.
> 
> I intend to use the static allocator above to allocate the object.
> Generally, the bootstrap part all use the static allocator which is
> set by calling dom_implregistry_dom_implementation_initialise. And
> that function is called by the dom_initialise, this means when the
> client initialise DOM, he should provide the allocator for
> implementation related stuff. And when he create document, it provide
> the document allocator then.

Ok.

> >> Index: include/dom/core/namednodemap.h
> >> ===================================================================
> >> --- include/dom/core/namednodemap.h   (revision 8546)
> >> +++ include/dom/core/namednodemap.h   (working copy)
> >> @@ -80,4 +80,11 @@
> >>               _dom_namednodemap_remove_named_item_ns((dom_namednodemap *) 
> >> (m), \
> >>               (dom_string *) (n), (dom_string *) (l), (dom_node **) (r))
> >>
> >> +bool _dom_namednodemap_equal(struct dom_namednodemap *m1,
> >> +             struct dom_namednodemap *m2);
> >> +
> >> +#define dom_namednodemap_equal(m1, m2) _dom_namednodemap_equal( \
> >> +             (struct dom_namednodemap *) (m1), \
> >> +             (struct dom_namednodemap *) (m2))
> >
> > Is this not implementation detail? Why is it in the public API?
> 
> I think our clients may need to test whether two NodeMap are equal.
> Such as in JS.

Would they not just perform simple equality checking -- i.e. comparing
the objects' addresses in memory?

> >> Index: include/dom/bootstrap/implregistry.h
> >> ===================================================================
> >> --- include/dom/bootstrap/implregistry.h      (revision 8546)
> >> +++ include/dom/bootstrap/implregistry.h      (working copy)
> >> @@ -15,16 +15,18 @@
> >>  struct dom_implementation_list;
> >>  struct dom_string;
> >>
> >> +/* Initialise the dom_implementation */
> >> +dom_exception dom_impleregistry_dom_implementation_initialise(
> >> +             dom_alloc allocator, void *ptr);
> >
> > Why does the registry need initialising?
> 
> Set the allocator for various implementation stuff alloc/dealloc.

Right.

> >
> >>  /* Retrieve a DOM implementation from the registry */
> >>  dom_exception dom_implregistry_get_dom_implementation(
> >>               struct dom_string *features,
> >> -             struct dom_implementation **impl,
> >> -             dom_alloc alloc, void *pw);
> >> +             struct dom_implementation **impl);
> >>
> >>  /* Get a list of DOM implementations that support the requested features 
> >> */
> >>  dom_exception dom_implregistry_get_dom_implementation_list(
> >>               struct dom_string *features,
> >> -             struct dom_implementation_list **list,
> >> -             dom_alloc alloc, void *pw);
> >> +             struct dom_implementation_list **list);
> >
> > These API changes prevent implementation objects being created on the
> > fly.
> 
> I am sorry, but could you please why?

It turns out that I completely misunderstood this change. Ignore my
comments here.

> >> -     } else {
> >> +     } else if (colon == 0) {
> >> +        /* Some name like ":name" */
> >> +        if (namespace != NULL)
> >> +            return DOM_NAMESPACE_ERR;
> >
> > ":name" means an element in a namespace with no prefix. It's perfectly
> > valid, assuming I'm remembering XML namespaces correctly.
> 
> In the spec, http://www.w3.org/TR/2004/REC-xml-names11-20040204/#ns-qualnames
> . It said a QName can't start with a ':' character.

So it does. I stand corrected.

> >> +
> >> +     while (slen > 0) {
> >> +             err = parserutils_charset_utf8_to_ucs4(s, slen, &c, &clen);
> >> +             if (err != PARSERUTILS_OK) {
> >> +                     return (uint32_t) -1;
> >> +             }
> >
> > If this function is called a lot, you'll find that
> > parserutils_charset_utf8_char_byte_length() is faster, as it doesn't
> > perform the conversion of utf8 to ucs4 for every character (it simply
> > considers the first byte of the UTF-8 character and returns the implied
> > byte length. Obviously, you'll need to convert the indexth character to
> > UCS-4 to return it.
> 
> I will try to change this.

Ok.

> >>
> >> +/* Walk the logic-adjacent text in document order */
> >> +static dom_exception walk_lat_order(dom_node_internal *node, 
> >> walk_operation opt,
> >> +             walk_order order, dom_string **ret, bool *cont);
> >> +/* Walk the logic-adjacent text */
> >> +static dom_exception walk_lat(dom_text *text, walk_operation opt,
> >> +             dom_string **ret);
> >
> > Can these two please be renamed to be more obvious -- "lat" is entirely
> > opaque to someone reading the code.
> >
> >> +dom_exception walk_lat_order(dom_node_internal *node, walk_operation opt,
> >> +             walk_order order, dom_string **ret, bool *cont)
> >
> > Needs documenting.
> >
> > It should also iterate across the subtree, not recurse. Recursion over
> > deep trees can result in stack overflows. Algorithms that recurse over
> > the tree should not appear anywhere in libdom.
> >
> >> +dom_exception walk_lat(dom_text *text, walk_operation opt,
> >> +             dom_string **ret)
> >
> > As above.
> 
> I will change this.

Thanks.

> >> --- src/core/element.c        (revision 8546)
> >> +++ src/core/element.c        (working copy)
> >> +/* Note: The DOM implentation usally use uppercase letter for element 
> >> names
> >> + *    and lowercase for attribute names.
> >> + */
> >
> > This is completely untrue. Case sensitivity entirely depends upon the
> > source document language. Libdom can make no assumptions about it.
> >
> >> +#define ID_ATTRIBUTE "id"
> >
> > Is there a need for this? The language binding is probably the place
> > that needs to be able to extract ID attributes.
> 
> I will change this.

Ok.

> >> +static dom_exception hashtable_clone(struct hash_table *sh,
> >> +             struct dom_document *sd, struct hash_table **th,
> >> +             struct dom_document *td, dom_node_internal *parent);
> >
> > Should this not be in hashtable.c?
> 
> There is only one place where I need to clone a hash_table, so I put
> it here. You are right it is in the hash_table, but putting it here is
> simple, because the Element now what is the key and value in the
> hash_table.

I disagree. It should be a generic function within hashtable.c.

Something like:

struct hash_table *hash_clone(struct hash_table *hash, 
                dom_alloc alloc, void *alloc_pw,
                void *(*clone_key)(void *key, void *pw, 
                                dom_alloc alloc, void *alloc_pw), 
                void *key_pw,
                void *(*clone_value)(void *value, void *pw,
                                dom_alloc alloc, void *alloc_pw),
                void *value_pw);

> >> +DIR_SOURCES := \
> >> +     string.c node.c \
> >> +     attr.c characterdata.c element.c \
> >> +     implementation.c impllist.c \
> >> +     text.c typeinfo.c comment.c \
> >> +     namednodemap.c nodelist.c \
> >> +     cdatasection.c document_type.c entity_ref.c pi.c \
> >> +     doc_fragment.c document.c
> >
> > Is there a reason why these are no longer in alphabetical order?
> 
> I rearrange the files according on their dependence and category.
> String and Node are types that all other types need.
> cdatasection/doc_type/entiry_ref/pi are XML related types.

Right.


J.

Re: Review: Bo Yang -- libdom Core

Reply via email to