[jira] Commented: (LUCENE-1336) Distributed Lucene using Hadoop RPC based RMI with dynamic classloading

Jason Rutherglen (JIRA) Sat, 26 Jul 2008 08:15:22 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617188#action_12617188
 ]


Jason Rutherglen commented on LUCENE-1336:
------------------------------------------

The classloading mechanism described above was also found to not be suitable.  
This because it would require a scan of all of the classes each time.  Because 
of inheritance, it is impossible to accurately obtain all of the classes 
without a scan on each serialization.  This impacts performance too much.

In working on this problem I found what I think is a design flaw in Java that 
would solve many of the issues and that is not compiling a serialVersionUID 
into classes automatically if they do not define one.  The current design 
creates inconsistencies during the deserialization process with the 
ObjectInputStream.resolveClass(ObjectStreamClass desc) where the 
ObjectStreamClass parameter returns a fake serialVersionUID that is 
inconsistent across VM implementations.  Also because this serialVersionUID is 
only available from the ObjectStreamClass it makes creating a map of classes 
and classes versions difficult.  

The solution which is easiest, most reliable and most efficient is to have a 
session based classloading mechanism, where the session is between a client and 
the server.  The client generates a unique session id every time the VM or in 
J2EE the webapp is loaded.  This mostly guarantees the classes on the client 
will be consistent (it is the client's responsibility restart the RMI object 
which generates a new session id if the client is dynamically loading classes). 
 The server maintains a SessionClassLoader per client session that is used by 
the deserialization code to dynamically load classes from the client.  The only 
limitation in this solution is with the number SessionClassLoaders a server can 
support.  In most systems it will not be factor.  The SessionClassLoaders on 
the server will simply expire from the map after a period of not being used, 
rather than use remote referencing which would increase network traffic 
unnecessarily.   


> Distributed Lucene using Hadoop RPC based RMI with dynamic classloading
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1336
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1336
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.3.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: lucene-1336.patch, lucene-1336.patch, lucene-1336.patch
>
>
> Hadoop RPC based RMI system for use with Lucene Searchable.  Keeps the 
> application logic on the client side with removing the need to deploy 
> application logic to the Lucene servers.  Removes the need to provision new 
> code to potentially hundreds of servers for every application logic change.  
> The use case is any deployment requiring Lucene on many servers.  This system 
> provides the added advantage of allowing custom Query and Filter classes (or 
> other classes) to be defined on for example a development machine and 
> executed on the server without deploying the custom classes to the servers 
> first.  This can save a lot of time and effort in provisioning, restarting 
> processes.  In the future this patch will include an IndexWriterService 
> interface which will enable document indexing.  This will allow subclasses of 
> Analyzer to be dynamically loaded onto a server as documents are added by the 
> client.
> Hadoop RPC is more scalable than Sun's RMI implementation because it uses non 
> blocking sockets.  Hadoop RPC is also far easier to understand and customize 
> if needed as it is embodied in 2 main class files 
> org.apache.hadoop.ipc.Client and org.apache.hadoop.ipc.Server.  
> Features include automatic dynamic classloading.  The dynamic classloading 
> enables newly compiled client classes inheriting core objects such as Query 
> or Filter to be used to query the server without first deploying the code to 
> the server.  
> Using RMI dynamic classloading is not used in practice because it is hard to 
> setup, requiring placing the new code in jar files on a web server on the 
> client.  Then requires custom system properties to be setup as well as Java 
> security manager configuration.  
> The dynamic classloading in Hadoop RMI for Lucene uses RMI to load the 
> classes.  Custom serialization and deserialization manages the classes and 
> the class versions on the server and client side.  New class files are 
> automatically detected and loaded using ClassLoader.getResourceAsStream and 
> so this system does not require creating a JAR file.  The use of the same 
> networking system used for the remote method invocation is used for the 
> loading classes over the network.  This removes the necessity of a separate 
> web server dedicated to the task and makes deployment a few lines of code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1336) Distributed Lucene using Hadoop RPC based RMI with dynamic classloading

Reply via email to