[ 
https://issues.apache.org/jira/browse/CONNECTORS-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002600#comment-14002600
 ] 

Karl Wright commented on CONNECTORS-916:
----------------------------------------

bq. About (5) - there are two limitation for id. You cannot use multi-byte 
characters. And id Can be up to 64 characters. Do you have any good idea to 
generate id from document URI ?

I would use ManifoldCF.hash(document_id) to create an ID that Amazon Cloud 
Search will like.  The hash produced is a base64-encoded SHA hash, which is 40 
character long.  There is no need to URL encode it afterwards either.  
ManifoldCF uses this everywhere.


> Amazon CloudSearch output connector
> -----------------------------------
>
>                 Key: CONNECTORS-916
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-916
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Amazon CloudSearch output connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Takumi Yoshida
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>         Attachments: 0507.diff, 0520.diff, 1.patch, 2.diff, 3.diff, 
> AmazonCloudSearchParam.java, AmazonCloudSearchSpecs.java, 
> exception_handling.diff, exception_handling_2.diff
>
>
> I wrote some codes snipetts of output connector for Amazon CloudSearch.
> I would like you to review my code. You can crawl web site and feed HTML page 
> to Amazon CloudSearch.
> but it is not perfectly completed followoing reason.
> - does not write any codes for configuration page.
> - supporting file type is only HTML
> Thank you for your time,
>  Takumi Yoshida



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to