DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=15724>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=15724

WSDL2Java takes hours on large files

           Summary: WSDL2Java takes hours on large files
           Product: Axis
           Version: current (nightly)
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Critical
          Priority: Other
         Component: WSDL processing
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


I am running WSDL2Java on a WSDL file that uses a  large schema (>900
types/elements).  After parsing the files, the tool hangs for several hours
during the "resolve references" phase of code generation.

Using a profiler, I have identified several contributing factors to this.

1. QName creation.  The symbolTable.Utils class creates hundreds of thousands of
QName objects while walking the DOM to find derived types.  Most of these are
redundant, and yet they are not cached.  I will attach a file of profiler output
that provides details, but creation of these nodes is where a lot of time is spent.

Suggestion: introduce a QName cache in the Utils class -- a Map of Maps, where
the outer map uses namespaces as keys and the inner maps use localNames as keys
and QNames as values.

2.      In ...axis.wsdl.symbolTable.SymbolTable, there's a poorly chosen
datastructure that yields O(n^2) performance (possibly worse) -- the "types"
Vector is subjected to multiple linear searches, in some cases from within
nested loops.

        Suggestion:  replace the types Vector with two Maps, one for QName
-> Element and one for QName -> Type.

3.      In javax.xml.namespace.QName, the localName and namespaceURI are
cached using String.intern().  This gets hit *a lot* from within
org.apache.axis.wsdl.symbolTable.Utils.getNodeQName().  Removing the
interned Strings and changing the QName.equals() implementation to use
String.equals() instead of reference comparison yields a significant
speedup.

4. org.apache.axis.wsdl.symbolTable.SchemaUtils.getComplexElementExtensionBase(
) gets called many, many times from recursive invocations of
org.apache.axis.wsdl.symbolTable.Utils.getDerivedTypes().  The invocation
count approaches O(n^2), where n is the size of the types collection.  

        Suggestion: most of the invocations of
getComplexElementExtensionBase(Node, SymbolTable) are redundant -- the
extension base of the complex type defined within Node does not change
across the recursive calls, but the search for that extension base is quite
expensive.  A cache of previously searched Nodes is very helpful here.
However, this method is static and so this resolution introduces a new
problem -- how to scope the cache so that multiple instances of SymbolTable
can coexist in the same VM?  Two possibilities:

*       make the cache a parameter to the method
*       make the method an instance method rather than a class method.

Reply via email to