Lee,

I was not sure what to post here. I ended up writing about 1000 words ranting on about multithreading and shared resources. I will give you the shorter version (under 350 words ;-) ).

Here are my thoughts:

1) Xerces DOM is not thread safe. I believe over 90% of the XML use in the Java Enterprise today is using this API. So perhaps you should rethink your previous comment:

However, this also limit the usage of DOM4J in any enterprise-wise
applications, and I didn't aware of any other xml document structure with
the same limitation...

Please look over the FAQ for Xerces at http://xml.apache.org/xerces2-j/faq-dom.html .


<snip>

Is Xerces DOM implementation thread-safe?

No. DOM does not require implementations to be thread safe. If you need to access the DOM from multiple threads, you are required to add the appropriate locks to your application code.

</snip>

So to disspell any misconceptions...

DOM4J IS A GREAT API FOR DOING ENTERPRISE WIDE XML APPLICATIONS !

2) I see a theme throughout the posts. Shared resources in a multithreaded environment.

I recommend reading Patterns in Java Volume 1 and focus on the Producer-Consumer pattern and put on its queue an object that is from the command pattern. The command object should have a copy of the data needed to execute the "sub-job". The copy will avoid thread locking issues. And if you decide in the future to modify the "Producer" of these "sub-jobs" you will not have to modify the "Consumer" that handles the tasks.


3) For the simplest design, I am glad it is implemented the way it currently is. Otherwise I would be paying a big penalty for using synchronization when I don't need it.


4) I believe you are suffering from a design issue that exposes a tightly coupled design. The reason folks don't like to perform Two-Phase-Commits against multiple resources, is....
yes, performance. Please do not suggest that the remaing 90% suffer from you enhancement request.


5) I hope that using the "NonLazyDocumentFactory" is an effective work around. But I wonder if you are taking advantage of decoupling and allowing your multiple threads to run without blocking on your resources.

Maybe you and I are not on the same page. If nothing else, maybe you will review the "Producer-Consumer" Pattern.

Best wishes in your efforts,

Dave


Lee, William wrote:
Seem like all I need is to use the "NonLazyDocumentFactory", and the
"NonLazyElement" defined in the dom4j.util package.

Is there a way to set up Dom4J to use this factory and element as the
default? I means, is there any setting(s) I can set such that
DocumentHelper, SAXParser, etc can pick it up?

Thanks again for all your help.
William.

-----Original Message-----
From: Lee, William Sent: Saturday, April 12, 2003 9:41 AM
To: 'David D. Lucas'; Lee, William
Cc: 'Mike Skells'; [EMAIL PROTECTED]
Subject: RE: [dom4j-dev] Thread-safe issue?



Hi David,


First of all, thanks for the suggestion, and I also agreed that there are
millions of different ways to write multi-threading program without passing
the dom around.

However, this is not the issue I'm looking at... The issue is whether we
need exclusive lock for all element operations.

The closer thing I can think of is the Java's HashMap, which is known to be
thread-unsafe and that exclusive lock is required for any update
operation... However, only "read-lock" is required for navigation (or
element look up), and concurrent lookup will not destroy the map internal
structure.

Without any changes, here is what we have for DOM4J:

"EXCLUSIVE lock is required for ALL element operations, not only update
operations. Failing to do so will destroy the dom internal structure and the
dom will no longer valid to be used."

and this also match your disclaimer stated in your message:

'Document parsing and building is expected to be done and the results used
inside the same thread.'

However, this also limit the usage of DOM4J in any enterprise-wise
applications, and I didn't aware of any other xml document structure with
the same limitation...

Since this is caused by the lazy array initialization in the DefaultElement,
what we are discussing here is the potential of removing this "smarties"
such that at least the dom can be "read" or "navigate" in multi-threads
without the EXCLUSIVE lock... (of course READ-LOCK is still required in case
some other threads updating the dom).

And I'll leave this to you folks to decided whether this limitation is the
design or is an issue...

Again, thanks for all your time and effort on this issue. Regards, William.

By the way, the changes I made only fixed half the problem.. The problem
still exist for element without any child...and the full fix is to remove
the "lazy array initialization" totally. I'll do some performance benchmark
later to see the effect of this changes.


-----Original Message-----
From: David D. Lucas [mailto:[EMAIL PROTECTED] Sent: Friday, April 11, 2003 3:20 PM
To: Lee, William
Cc: 'Mike Skells'; [EMAIL PROTECTED]
Subject: Re: [dom4j-dev] Thread-safe issue?



The current design is not thread-hot. A term that follows me from the days of C++ and RogueWave software. Collections are typically not thread-hot by default. You require a synchronized mutation for the collection and any iterators it gives out. This is a major performance hit. Just look at the impact Vector had on us.


I vote for implementation that separates the different processing. You have one thread processing XML. Why don't you have the main create "requests" using the Command pattern for each type you are receiving and as you process the document, you can drop the work off in a Queue that the worker threads can pick up and process. This would make seperation of concerns cleaner. The work of handling the incoming document processing can be separate to the processing of the "logical request".

Here is an example:

Thread1                                     WorkerThread2
   |                                              |
   +parse XML                Queue.pop() <--------+ blocked (empty)
   |                                              |
   +identify work                                 |
   |                                              |
   +create work request                           |
   |                                              |
   +drop off work       ---> Queue.push()         |
   |                                              + Work received
   +identify more work                            |
   ...                                            + processes the work


This way you can scale by increasing the number of workers. If there is a lot of asynchronous work. This implementation puts the synchronization on the Queue and off of the remaining threads.


Another idea is to take advantage of blocking requests. When one thread blocks, we don't want it blocking due to another thread, unless it is idle like waiting on the Queue. The WorkerThread2 will process the incoming request and eventually block itself on I/O like a database.

This block during work could cause more threading issues if that thread has maintained a lock on the Element or a List shared by other threads.

I maintain that DOM4J was not intended to pass Elements between threads, but that XML parsing and building would no impact other threads parsing and building capability. Only shared resources within DOM4J are protected. Document parsing and building is expected to be done and the results used inside the same thread.

If I am way off base here, someone jump in and correct me.
But I do large scale software development for a living and have used similar patterns described above with huge success.


I hope it makes sense. My 2 cents.

Dave






Lee, William wrote:


I'm 100% agreed on "Optimal performance can only be achieved when multiple parts produce minimal friction." But what we have here now is that the dom4j dom itself is the friction across threads...

What we have here is the main thread accepting xml request, and then spawn some worker threads to do different jobs based on the input (like sending email, running a report, and so on). The advantage to use multiple threads here is that the main thread don't need to block/wait until all jobs is completed... (for example, a sub-job is used to create server log, which really doesn't matter if it is completed or not).

Notices that this problem existed even if we don't modify the dom once it is created. And I still can't believed that the dom4j dom should NOT be read in multiple threads...

From what I see, the DefaultElement is not thread-safed in the sense that not only during creation time, but also during read operations, which limited dom4j to be used in one and only one thread, which I think is an issue.

Hope this clarify the problem.
Thanks again for all your help.
William.





--

+------------------------------------------------------------+
| David Lucas                        mailto:[EMAIL PROTECTED]  |
| Lucas Software Engineering, Inc.   (740) 964-6248 Voice    |
| Unix,Java,C++,CORBA,XML,EJB        (614) 668-4020 Mobile   |
| Middleware,Frameworks              (888) 866-4728 Fax/Msg  |
+------------------------------------------------------------+
| GPS Location:  40.0150 deg Lat,  -82.6378 deg Long         |
| IMHC: "Jesus Christ is the way, the truth, and the life."  |
| IMHC: "I know where I am; I know where I'm going."    <><  |
+------------------------------------------------------------+

Notes: PGP Key Block=http://www.lse.com/~ddlucas/pgpblock.txt
IMHO="in my humble opinion" IMHC="in my humble conviction"
All trademarks above are those of their respective owners.




-------------------------------------------------------
This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger for complex code. Debugging C/C++ programs can leave you feeling lost and disoriented. TotalView can help you find your way. Available on major UNIX and Linux platforms. Try it free. www.etnus.com
_______________________________________________
dom4j-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

Reply via email to