[
https://issues.apache.org/jira/browse/CONNECTORS-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010375#comment-14010375
]
Takumi Yoshida commented on CONNECTORS-916:
-------------------------------------------
Hi Karl,
bq. Another major point: Tika, it seems to me, may well process documents
completely in memory.
I checked Tika document. And I found that Tika support "streamied parsing" and
does not hold hole of a document on memory. Also the parsing method use SAX for
processing metadata.
http://tika.apache.org/1.5/parser.html
It looks fine for me. What do you think about this?
> Amazon CloudSearch output connector
> -----------------------------------
>
> Key: CONNECTORS-916
> URL: https://issues.apache.org/jira/browse/CONNECTORS-916
> Project: ManifoldCF
> Issue Type: New Feature
> Components: Amazon CloudSearch output connector
> Affects Versions: ManifoldCF 1.7
> Reporter: Takumi Yoshida
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.7
>
> Attachments: 0507.diff, 0520.diff, 0520_2.diff, 0527.diff, 1.patch,
> 2.diff, 3.diff, AmazonCloudSearchParam.java, AmazonCloudSearchSpecs.java,
> exception_handling.diff, exception_handling_2.diff, licenselist.txt
>
>
> I wrote some codes snipetts of output connector for Amazon CloudSearch.
> I would like you to review my code. You can crawl web site and feed HTML page
> to Amazon CloudSearch.
> but it is not perfectly completed followoing reason.
> - does not write any codes for configuration page.
> - supporting file type is only HTML
> Thank you for your time,
> Takumi Yoshida
--
This message was sent by Atlassian JIRA
(v6.2#6252)