[ https://issues.apache.org/jira/browse/CONNECTORS-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886744#action_12886744 ]
Karl Wright commented on CONNECTORS-56: --------------------------------------- The general approach for an API that I'd suggest, which would be completely compatibly with the Quick Start version of LCF, would basically consist of its own web application. The web application would consist completely of a servlet, which would interpret path and argument information as commands and posted command arguments, respectively. A separate web application would permit users to control access to the API using standard application server access management mechanisms. All commands would consist of HTTP posts that address the API servlet. The commands themselves would come in broad categories, and be specified as part of the "path", as follows: job/find - get existing job information job/start - start a job job/abort - abort a job job/restart - restart a job job/delete - delete a job job/save - save a job ... report/simplehistory - generate a simple history report report/documentstatus - generate a document status report ... etc. As for arguments and return values, my sense is that have a choice of either XML or JSON here. It's not clear which is better, but my preference would be JSON, because of its relative simplicity, and because otherwise people will be tempted to want us to use full SOAP, with WSDLs etc. That would add a lot of overhead to the solution, in my opinion. The API commands would be stateless, in that there would be no explicit or implicit session created. All arguments are therefore explicit. Connection configuration, document specification, and output specification information is the main problem. This information is easily mappable to unstructured XML, and is managed by LCF in this way. The contents of the XML is determined wholly by the involved connector code, and is therefore opaque as far as the API is concerned. Embedding such XML as a JSON field is certainly possible, or it would even be possible to convert it to embedded or nested JSON. It's really not clear to me what the best approach is here, although embedded JSON would require fewer moving parts in the client. If this approach is to fly, somebody will need to document these opaque configuration structures, which will mean that these structures must remain backwards compatible as they evolve. Thoughts? > All features should be accessible through an API > ------------------------------------------------ > > Key: CONNECTORS-56 > URL: https://issues.apache.org/jira/browse/CONNECTORS-56 > Project: Lucene Connector Framework > Issue Type: Improvement > Components: Framework core > Reporter: Jack Krupansky > > LCF consists of a full-featured crawling engine and a full-featured user > interface to access the features of that engine, but some applications are > better served with a full API that lets the application control the crawling > engine, including creation and editing of connections and creation, editing, > and control of jobs. Put simply, everything that a user can accomplish via > the LCF UI should be doable through an LCF API. All LCF objects should be > queryable through the API. > A primary use case is Solr applications which currently use Aperture for > crawling, but would prefer the full-featured capabilities of LCF as a > crawling engine over Aperture. > I do not wish to over-specify the API in this initial description, but I > think the LCF API should probably be a traditional REST API., with some of > the API elements specified via the context path, some parameters via URL > query parameters, and complex, detailed structures as JSON (or similar.). The > precise details of the API are beyond the scope of this initial description > and will be added incrementally once the high-level approach to the API > becomes reasonably settled. > A job status and event reporting scheme is also needed in conjunction with > the LCF API. That requirement has already been captured as CONNECTORS-41. > The intention for the API is to create, edit, access, and control all of the > objects managed by LCF. The main focus is on repositories, jobs, and status, > and less about document-specific crawling information, but there may be some > benefit to querying crawling status for individual documents as well. > Nothing in this proposal should in any way limit or constrain the features > that will be available in the LCF UI. The intent is that LCF should continue > to have a full-featured UI, but in addition to a full-featured API. > Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.