Leo's import system has many strengths.  The most important is that code 
will almost always import correctly even if the resulting nodes aren't even 
close to optimal.  This is particularly important for javascript, where 
coding styles vary so widely.

After generating nodes, the import system calls the clean_headline method.  
The base Importer class defines a clean_headline as follows:

def clean_headline(self, s, p=None):
    Return the cleaned version headline s.
    Will typically be overridden in subclasses.
    return s.strip()

The javascript importer now defines this method this way:

clean_regex_list1 = [
clean_regex_list2 = [

def clean_headline(self, s, p=None):
    '''Return a cleaned up headline s.'''
    s = s.strip()
    # Don't clean a headline twice.
    if s.endswith('>>') and s.startswith('<<'):
        return s
    for ch in '{(=':
        if s.endswith(ch):
            s = s[:-1].strip()
    # First regex cleanup.
    for pattern in self.clean_regex_list1:
        m = pattern.match(s)
        if m:
            s = m.group(1)
    # Second regex cleanup.
    for pattern in self.clean_regex_list2:
        m = pattern.match(s)
        if m:
            s = m.group(1) + m.group(2)
    s = s.replace('  ', ' ')
    return g.truncate(s, 100)

This isn't perfect, but is much better than previous versions.  I encourage 
Vitalije or anyone else to suggest improvements.


You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To post to this group, send email to leo-editor@googlegroups.com.
Visit this group at https://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

Reply via email to