The use cases I was considering for database issues are:
1) Desire for a very simple evaluation install process. See the Solr tutorial. 2) Desire for less complex and faster application deployment install process. PostgreSQL has a reputation for having "a large footprint."
Now, as machines and software evolve, it is not completely clear to me how "bad" PostgreSQL is these days, but having a separate deployment step to accommodate PostgreSQL interferes with use case #1.
That said, I am not sure that I would hold up getting the first official release of LCF out the door. After all, leading-edge ("bleeding-edge") users are used to more than a little inconvenience. Still, a Solr-simple evaluation install would be... "sweet".
-- Jack Krupansky -------------------------------------------------- From: <karl.wri...@nokia.com> Sent: Friday, May 28, 2010 2:17 PM To: <connectors-dev@incubator.apache.org> Subject: RE: Proposal for simple LCF deployment model
I've been fighting with Derby for two days. It's missing a significant amount of important functionality, and its user and database model are radically different from all other databases I know of. (I'm also getting nonsense exceptions from it, but that's another matter.) So regardless of how good the database abstraction layer is, expecting all databases to have sufficient functionality to get anything done is ridiculous. If I get Derby working, I will let you know whether it is feasible at all to run LCF on in under any circumstances or not, but that *cannot* be the primary database people use with this project. I'm also still waiting for a use-case from you as to how getting rid of the Postgresql database makes your life easier at all - and if your use case involves using Derby for anything serious, I'll have to say that I don't think that's realistic.LCF has a very clean connector abstraction today. So all we're really talking about is the build process here - whether it is possible to separate build and deployment of the framework and some connectors from the builds of other connectors. Having each connector run as a separate process seems like overkill and would also impact performance pretty dramatically, as well as requiring quite a bit of additional configuration. The "Solr plug-in model" is a bit better and requires only the addition of a custom classloader that explicitly loads any plugin classes and any classes that those use. The required defines that some libraries need would have to be solved, but that needs doing anyway and I think I can have individual connectors set these as needed.Karl -----Original Message----- From: ext Jack Krupansky [mailto:jack.krupan...@lucidimagination.com] Sent: Friday, May 28, 2010 1:49 PM To: connectors-dev@incubator.apache.org Subject: Re: Proposal for simple LCF deployment model But for a basic, early evaluation, "test drive", just the file system and web repository connectors should be sufficient. And if there is a clean database abstraction, a basic database package (e.g., derby) should be sufficient for such a basic evaluation.Are there technical reasons why third-party repository connectors cannot be supported using a Solr-style "plug-in" approach? Or, worst case, as separate processes with a clean inter-process API? Maybe not in the near-term, but asa longer-term vision. -- Jack Krupansky -------------------------------------------------- From: <karl.wri...@nokia.com> Sent: Friday, May 28, 2010 11:10 AM To: <connectors-dev@incubator.apache.org> Subject: Re: Proposal for simple LCF deployment modelYou forget that building lcf in its entirety requires that you supply proprietary client components from third-party vendors. So i think it is unrealistic to expect canned builds that contain everything that you just deploy. For lcf i think the build cycle will thus be very common. Getting rid of the database requirement is also obviously not an option. Karl --- original message --- From: "ext Jack Krupansky" <jack.krupan...@lucidimagination.com> Subject: Re: Proposal for simple LCF deployment model Date: May 28, 2010 Time: 10:42:17 AM A simple deployment ala Solr is a good goal. Integrating Jetty with the LCF deployment will go a long way towards that goal. The database software deployment (PostgreSQL) is the other half of the hassle with deploying LCF. I think there are three distinct goals here: 1) A super-easy Solr-style deployment for initial evaluation of LCF, 2) deployment of the LCF components for full-blown application development where app server and database might need to be different from the initial evaluation, and 3) deployment of LCF components for production deployment of the full application. Right now, evaluation of LCF requires deployment of the source code and building artifacts - Solr evaluation does not require that step. Eliminated the source and build step will certainly help simplify the evaluation process. Another possible consideration is that although some of us are especiallyinterested in integration with Solr and doing so easily and robustly, Solris just one of the output connections and LCF could be deployed forapplications that do not involve Solr at all. So, maybe there should be anextra deployment wiki page for Solr guys that focuses on use of LCF with Solr and related issues. Whether that should be the default presentation inthe doc is a matter for debate. Right now, I see no harm with a Solr bias.At least it is a convenient way to demonstrate end-to-end use of LCF. -- Jack Krupansky -------------------------------------------------- From: <karl.wri...@nokia.com> Sent: Friday, May 28, 2010 5:48 AM To: <connectors-dev@incubator.apache.org> Subject: Proposal for simple LCF deployment modelThe current LCF standard deployment model requires a number of moving parts, which are probably necessary in some cases, but simply introduce complexity in others. It has occurred to me that it may be possible toprovide an alternate deployment model involving Jetty, which would reducethe number of moving parts by one (by eliminating Tomcat). A simple LCF deployment could then, in principle, look pretty much like Solr's. In order for this to work, the following has to be true: (1) jetty's basic JSP support must be comparable to Tomcat's. (2) the class loader that jetty uses for webapp's must provide class isolation similar to Tomcat's. If this condition is not met, we'd need to build both a Tomcat and a Jetty version of each webapp.The overall set of changes that would be required would be the following:(a) An alternative "start" entry point would need to be coded, which would start Jetty running the lcf-crawler-ui and lcf-authority-service webapps before bringing up the agents engine. (b) The alternative starting point should probably autocreate thedatabase, and should also autoregister all connectors. This will requirea list, somewhere, of the connectors and authorities that are included,and their preferred UI names for that installation. This could come fromthe configuration information, or from some other place. Any ideas? (c) There would need to an additional jar produced by the build process, which would be the equivalent of the solr start.jar, so as to make running the whole stack trivial. (d) An "LCF API" web application, which provides access to all of the current LCF commands, would also be an obvious requirement to go forward with this model.What are the disadvantages? Well, I think that the main problem would besecurity. This deployment model, though simple, does not control access to LCF is any way. You'd need to introduce another moving part to do that.Bear in mind that this change would still not allow LCF to run using onlyone process. There are still separate RMI-based processes needed for some connectors (Documentum and FileNet). Although these could in theory bestarted up using Java Activation, a main reason for a separate process inDocumentum's case is that DFC randomly crashes the JVM under which it runs, and thus needs to be independently restarted if and when it dies. If anyone has experience with Java Activation and wants to contribute their time to develop infrastructure that can deal with that problem, please let me know. Finally, there is no way around the fact that LCF requires awell-performing database, which constitutes an independent moving part ofits own. This proposal does nothing to change that at all. Please note that I'm not proposing that the current model go away, but rather that we support both. Thoughts? Karl