[ 
https://issues.apache.org/jira/browse/CONNECTORS-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771810#comment-13771810
 ] 

Karl Wright edited comment on CONNECTORS-13 at 9/19/13 11:45 AM:
-----------------------------------------------------------------

Here's a proposal, then, which is consistent with how MCF currently does 
things, but also allows for zookeeperification where sensible.

(1) We keep the properties.xml file around.  It's the only thing you need to 
provide to MCF to get it started, and you can point at it with a -D switch.  
But:
(2) We add (optional) properties to it to configure Zookeeper - or, more 
precisely, the Zookeeper implementation of ILockManager.
(3) We add property access methods to ILockManager.  The file-based 
implementation of ILockManager simply vectors these property access methods 
back to properties.xml.
(4) We add a zookeeper implementation of ILockManager, which fetches its 
startup properties from properties.xml, but which stores and retrieves 
properties accessed through ILockManager in zookeeper.
(5) We change all appropriate property accesses throughout ManifoldCF to obtain 
their properties via ILockManager.
(6) As syntactic sugar, we can add property accessors to LockManagerFactory as 
well.

Open questions:

(a) How do we SET properties in Zookeeper?  When does this happen, and by what 
process?
(b) Who writes the Zookeeper ILockManager implementation?  Anybody around who 
has some experience with this? ;-)
(c) For paths, such as where ManifoldCF looks for jars etc, we obviously need 
to think through the solution.  What does Solr do in the zookeeper incarnation 
in this case?

Thoughts?  Answers?  Ideas?

                
      was (Author: [email protected]):
    Here's a proposal, then, which is consistent with how MCF currently does 
things, but also allows for zookeeperification where sensible.

(1) We keep the properties.xml file around.  It's the only thing you need to 
provide to MCF to get it started, and you can point at it with a -D switch.  
But:
(2) We add (optional) properties to it to configure Zookeeper - or, more 
precisely, the Zookeeper implementation of ILockManager.
(3) We add property access methods to ILockManager.  The file-based 
implementation of ILockManager simply vectors these property access methods 
back to properties.xml.
(4) We add a zookeeper implementation of ILockManager, which fetches its 
startup properties from properties.xml, but which stores and retrieves 
properties accessed through ILockManager in zookeeper.
(5) We change all appropriate property accesses throughout ManifoldCF to obtain 
their properties via ILockManager.
(6) As syntactic sugar, we can add property accessors to LockManagerFactory as 
well.

Open questions:

(a) How do we SET properties in Zookeeper?  When does this happen, and by what 
process?
(b) Who writes the Zookeeper ILockManager implementation?  Anybody around who 
has some experience with this? ;-)

Thoughts?  Answers?  Ideas?

                  
> We should move to eliminate process synchronization via shared file system, 
> and use a process/service instead
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-13
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-13
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Framework core
>    Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2
>            Reporter: Karl Wright
>             Fix For: ManifoldCF next
>
>
> The current implementation relies on the file system to synchronize activity 
> between various LCF processes.  This has several downsides: first, it is 
> possible to get the file system into a state that is corrupted (by killing 
> processes); second, this limits the future ability to spread crawler workload 
> over multiple machines.
> It should be reasonably straightforward, and probably more resilient, to 
> introduce a "synchronization process", which all other LCF processes talk to 
> in order to manage locks, shared data, and other synchronization activities.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to