[ 
https://issues.apache.org/jira/browse/CONNECTORS-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886744#action_12886744
 ] 

Karl Wright commented on CONNECTORS-56:
---------------------------------------

The general approach for an API that I'd suggest, which would be completely 
compatibly with the Quick Start version of LCF, would basically consist of its 
own web application.  The web application would consist completely of a 
servlet, which would interpret path and argument information as commands and 
posted command arguments, respectively.  A separate web application would 
permit users to control access to the API using standard application server 
access management mechanisms.

All commands would consist of HTTP posts that address the API servlet.  The 
commands themselves would come in broad categories, and be specified as part of 
the "path", as follows:

job/find - get existing job information
job/start - start a job
job/abort - abort a job
job/restart - restart a job
job/delete - delete a job
job/save - save a job
...
report/simplehistory - generate a simple history report
report/documentstatus - generate a document status report
...
etc.

As for arguments and return values, my sense is that have a choice of either 
XML or JSON here.  It's not clear which is better, but my preference would be 
JSON, because of its relative simplicity, and because otherwise people will be 
tempted to want us to use full SOAP, with WSDLs etc.  That would add a lot of 
overhead to the solution, in my opinion.

The API commands would be stateless, in that there would be no explicit or 
implicit session created.  All arguments are therefore explicit.

Connection configuration, document specification, and output specification 
information is the main problem.  This information is easily mappable to 
unstructured XML, and is managed by LCF in this way.  The contents of the XML 
is determined wholly by the involved connector code, and is therefore opaque as 
far as the API is concerned.  Embedding such XML as a JSON field is certainly 
possible, or it would even be possible to convert it to embedded or nested 
JSON.  It's really not clear to me what the best approach is here, although 
embedded JSON would require fewer moving parts in the client.

If this approach is to fly, somebody will need to document these opaque 
configuration structures, which will mean that these structures must remain 
backwards compatible as they evolve.

Thoughts?


> All features should be accessible through an API
> ------------------------------------------------
>
>                 Key: CONNECTORS-56
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-56
>             Project: Lucene Connector Framework
>          Issue Type: Improvement
>          Components: Framework core
>            Reporter: Jack Krupansky
>
> LCF consists of a full-featured crawling engine and a full-featured user 
> interface to access the features of that engine, but some applications are 
> better served with a full API that lets the application control the crawling 
> engine, including creation and editing of connections and creation, editing, 
> and control of jobs. Put simply, everything that a user can accomplish via 
> the LCF UI should be doable through an LCF API. All LCF objects should be 
> queryable through the API.
> A primary use case is Solr applications which currently use Aperture for 
> crawling, but would prefer the full-featured capabilities of LCF as a 
> crawling engine over Aperture.
> I do not wish to over-specify the API in this initial description, but I 
> think the LCF API should probably be a traditional REST API., with some of 
> the API elements specified via the context path, some parameters via URL 
> query parameters, and complex, detailed structures as JSON (or similar.). The 
> precise details of the API are beyond the scope of this initial description 
> and will be added incrementally once the high-level approach to the API 
> becomes reasonably settled.
> A job status and event reporting scheme is also needed in conjunction with 
> the LCF API. That requirement has already been captured as CONNECTORS-41.
> The intention for the API is to create, edit, access, and control all of the 
> objects managed by LCF. The main focus is on repositories, jobs, and status, 
> and less about document-specific crawling information, but there may be some 
> benefit to querying crawling status for individual documents as well.
> Nothing in this proposal should in any way limit or constrain the features 
> that will be available in the LCF UI. The intent is that LCF should continue 
> to have a full-featured UI, but in addition to a full-featured API.
> Note: This issue is part of Phase 2 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to