DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=15724>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=15724 WSDL2Java takes hours on large files Summary: WSDL2Java takes hours on large files Product: Axis Version: current (nightly) Platform: All OS/Version: All Status: NEW Severity: Critical Priority: Other Component: WSDL processing AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] I am running WSDL2Java on a WSDL file that uses a large schema (>900 types/elements). After parsing the files, the tool hangs for several hours during the "resolve references" phase of code generation. Using a profiler, I have identified several contributing factors to this. 1. QName creation. The symbolTable.Utils class creates hundreds of thousands of QName objects while walking the DOM to find derived types. Most of these are redundant, and yet they are not cached. I will attach a file of profiler output that provides details, but creation of these nodes is where a lot of time is spent. Suggestion: introduce a QName cache in the Utils class -- a Map of Maps, where the outer map uses namespaces as keys and the inner maps use localNames as keys and QNames as values. 2. In ...axis.wsdl.symbolTable.SymbolTable, there's a poorly chosen datastructure that yields O(n^2) performance (possibly worse) -- the "types" Vector is subjected to multiple linear searches, in some cases from within nested loops. Suggestion: replace the types Vector with two Maps, one for QName -> Element and one for QName -> Type. 3. In javax.xml.namespace.QName, the localName and namespaceURI are cached using String.intern(). This gets hit *a lot* from within org.apache.axis.wsdl.symbolTable.Utils.getNodeQName(). Removing the interned Strings and changing the QName.equals() implementation to use String.equals() instead of reference comparison yields a significant speedup. 4. org.apache.axis.wsdl.symbolTable.SchemaUtils.getComplexElementExtensionBase( ) gets called many, many times from recursive invocations of org.apache.axis.wsdl.symbolTable.Utils.getDerivedTypes(). The invocation count approaches O(n^2), where n is the size of the types collection. Suggestion: most of the invocations of getComplexElementExtensionBase(Node, SymbolTable) are redundant -- the extension base of the complex type defined within Node does not change across the recursive calls, but the search for that extension base is quite expensive. A cache of previously searched Nodes is very helpful here. However, this method is static and so this resolution introduces a new problem -- how to scope the cache so that multiple instances of SymbolTable can coexist in the same VM? Two possibilities: * make the cache a parameter to the method * make the method an instance method rather than a class method.