[
https://issues.apache.org/jira/browse/CONNECTORS-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002049#comment-14002049
]
Karl Wright commented on CONNECTORS-916:
----------------------------------------
Hi Takumi,
I see a couple of issues with this patch.
(1) There is already a poi download; you can't download two different versions
of poi to the same place and have anything sensible happen.
(2) The patch includes a large number of new jars. Somebody needs to do
research to find out with the license is for each of these jars, and they will
need to be mentioned in the appropriate license file. If any of the jars is
not an Apache-approved license, something else will need to be done. If you
can summarize the jars being added and their licenses, I can write the actual
license update.
(3) There's some nonsense code in the build file I don't understand, for
example:
{code}
+ <include name="poi-ooxml*.jar"/>
+ <include name="poi-ooxml*.jar"/>
+ <include name="poi-ooxml*.jar"/>
+ <include name="poi-ooxml*.jar"/>
+ <include name="poi-ooxml*.jar"/>
{code}
(4) I see a lot of diffs of this kind, which don't appear to be changing
anything. What is happening here?
{code}
- {
- // Establish a session
+ {
+ // Establish a session
getSession();
SpecPacker sp = new SpecPacker(outputDescription);
- String jsondata = "";
+ String jsondata = "";
{code}
(5) I have some concerns about this:
{code}
+ // generate document id from document URI using md5 checksum.
+ // document id must be shorter than 128 characters because of
AmazonCloudSearch.
+ try {
+ doc.setId(URLEncoder.encode(generateid(documentURI),"utf-8"));
+ } catch (NoSuchAlgorithmException e) {
+ throw new ManifoldCFException(e);
+ }
+
{code}
Generally, since hashes have a possibility of colliding, it is better to send
the URI as metadata, and then look up the document identifier by searching for
the document metadata URI. You can create a document identifier then only if
you don't find the matching record. If that is not possible, I'd still
structure the code so that there's a single place where the hashing is done.
You might also look at and use:
http://manifoldcf.apache.org/release/trunk/api/framework/org/apache/manifoldcf/core/system/ManifoldCF.html#encrypt%28java.lang.String%29
Why does this need to be included 5 times? xercesImpl is also included twice
too.
> Amazon CloudSearch output connector
> -----------------------------------
>
> Key: CONNECTORS-916
> URL: https://issues.apache.org/jira/browse/CONNECTORS-916
> Project: ManifoldCF
> Issue Type: New Feature
> Components: Amazon CloudSearch output connector
> Affects Versions: ManifoldCF 1.7
> Reporter: Takumi Yoshida
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.7
>
> Attachments: 0507.diff, 0520.diff, 1.patch, 2.diff, 3.diff,
> AmazonCloudSearchParam.java, AmazonCloudSearchSpecs.java,
> exception_handling.diff, exception_handling_2.diff
>
>
> I wrote some codes snipetts of output connector for Amazon CloudSearch.
> I would like you to review my code. You can crawl web site and feed HTML page
> to Amazon CloudSearch.
> but it is not perfectly completed followoing reason.
> - does not write any codes for configuration page.
> - supporting file type is only HTML
> Thank you for your time,
> Takumi Yoshida
--
This message was sent by Atlassian JIRA
(v6.2#6252)