[ 
https://issues.apache.org/jira/browse/CONNECTORS-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554653#comment-13554653
 ] 

Karl Wright commented on CONNECTORS-613:
----------------------------------------

I turned on wire debugging and captured the http back-and-forth for the files.  
Here it is:

{code}
DEBUG 2013-01-15 20:58:48,069 (Thread-391) - >> "GET 
/solr/admin/ping?wt=xml&version=2.2 HTTP/1.1[\r][\n]"
DEBUG 2013-01-15 20:58:48,071 (Thread-391) - >> "User-Agent: 
Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0[\r][\n]"
DEBUG 2013-01-15 20:58:48,071 (Thread-391) - >> "Host: localhost:8983[\r][\n]"
DEBUG 2013-01-15 20:58:48,071 (Thread-391) - >> "Connection: Keep-Alive[\r][\n]"
DEBUG 2013-01-15 20:58:48,071 (Thread-391) - >> "[\r][\n]"
DEBUG 2013-01-15 20:58:48,114 (Thread-391) - << "HTTP/1.1 200 OK[\r][\n]"
DEBUG 2013-01-15 20:58:48,116 (Thread-391) - << "Content-Type: application/xml; 
charset=UTF-8[\r][\n]"
DEBUG 2013-01-15 20:58:48,116 (Thread-391) - << "Transfer-Encoding: 
chunked[\r][\n]"
DEBUG 2013-01-15 20:58:48,116 (Thread-391) - << "[\r][\n]"
DEBUG 2013-01-15 20:58:48,132 (Thread-391) - << "1AE[\r][\n]"
DEBUG 2013-01-15 20:58:48,132 (Thread-391) - << "<"
DEBUG 2013-01-15 20:58:48,132 (Thread-391) - << "?"
DEBUG 2013-01-15 20:58:48,132 (Thread-391) - << "x"
DEBUG 2013-01-15 20:58:48,133 (Thread-391) - << "ml version="1.0" encoding="U"
DEBUG 2013-01-15 20:58:48,134 (Thread-391) - << "TF-8"?>[\n]"
DEBUG 2013-01-15 20:58:48,134 (Thread-391) - << "<response>[\n]"
DEBUG 2013-01-15 20:58:48,134 (Thread-391) - << "<lst 
name="responseHeader"><int name="status">0</int><int name="QTime">4</int><lst 
name="params"><str name="df">text</str><str name="echoParams">all</str><str 
name="rows">10</str><str name="echoParams">all</str><str 
name="wt">xml</str><str name="version">2.2</str><str 
name="q">solrpingquery</str><str name="distrib">false</str></lst></lst><str 
name="status">OK</str>[\n]"
DEBUG 2013-01-15 20:58:48,134 (Thread-391) - << "</response>[\n]"
DEBUG 2013-01-15 20:58:48,141 (Thread-391) - << "[\r][\n]"
DEBUG 2013-01-15 20:58:48,141 (Thread-391) - << "0[\r][\n]"
DEBUG 2013-01-15 20:58:48,141 (Thread-391) - << "[\r][\n]"
DEBUG 2013-01-15 21:00:58,607 (Thread-760) - >> "POST 
/solr/update/extract?literal.id=file%3A%2FC%3A%2Ftemparea%2Ffiles%2Futf-8.txt&literal.uri=C%3A%5Ctemparea%5Cfiles%5Cutf-8.txt&wt=xml&version=2.2
 HTTP/1.1[\r][\n]"
DEBUG 2013-01-15 21:00:58,607 (Thread-760) - >> "User-Agent: 
Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0[\r][\n]"
DEBUG 2013-01-15 21:00:58,607 (Thread-760) - >> "Transfer-Encoding: 
chunked[\r][\n]"
DEBUG 2013-01-15 21:00:58,607 (Thread-760) - >> "Content-Type: 
application/octet-stream[\r][\n]"
DEBUG 2013-01-15 21:00:58,607 (Thread-760) - >> "Host: localhost:8983[\r][\n]"
DEBUG 2013-01-15 21:00:58,607 (Thread-760) - >> "Connection: Keep-Alive[\r][\n]"
DEBUG 2013-01-15 21:00:58,607 (Thread-760) - >> "[\r][\n]"
DEBUG 2013-01-15 21:00:58,622 (Thread-761) - >> "POST 
/solr/update/extract?literal.id=file%3A%2FC%3A%2Ftemparea%2Ffiles%2Fsjis.txt&literal.uri=C%3A%5Ctemparea%5Cfiles%5Csjis.txt&wt=xml&version=2.2
 HTTP/1.1[\r][\n]"
DEBUG 2013-01-15 21:00:58,622 (Thread-761) - >> "User-Agent: 
Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0[\r][\n]"
DEBUG 2013-01-15 21:00:58,622 (Thread-761) - >> "Transfer-Encoding: 
chunked[\r][\n]"
DEBUG 2013-01-15 21:00:58,622 (Thread-761) - >> "Content-Type: 
application/octet-stream[\r][\n]"
DEBUG 2013-01-15 21:00:58,622 (Thread-761) - >> "Host: localhost:8983[\r][\n]"
DEBUG 2013-01-15 21:00:58,622 (Thread-761) - >> "Connection: Keep-Alive[\r][\n]"
DEBUG 2013-01-15 21:00:58,622 (Thread-761) - >> "[\r][\n]"
DEBUG 2013-01-15 21:00:58,632 (Thread-760) - >> "15[\r][\n]"
DEBUG 2013-01-15 21:00:58,632 (Thread-760) - >> "This is a utf-8 text."
DEBUG 2013-01-15 21:00:58,632 (Thread-760) - >> "[\r][\n]"
DEBUG 2013-01-15 21:00:58,632 (Thread-760) - >> "0[\r][\n]"
DEBUG 2013-01-15 21:00:58,632 (Thread-760) - >> "[\r][\n]"
DEBUG 2013-01-15 21:00:58,643 (Thread-761) - >> "31[\r][\n]"
DEBUG 2013-01-15 21:00:58,643 (Thread-761) - >> "This is a sjis text. 
[0x82][0xb1][0x82][0xea][0x82][0xcd][0x93][0xfa][0x96]{[0x8c][0xea][0x82][0xcc][0x83]e[0x83]L[0x83]X[0x83]g[0x82][0xc5][0x82][0xb7][0x81]B"
DEBUG 2013-01-15 21:00:58,643 (Thread-761) - >> "[\r][\n]"
DEBUG 2013-01-15 21:00:58,643 (Thread-761) - >> "0[\r][\n]"
DEBUG 2013-01-15 21:00:58,643 (Thread-761) - >> "[\r][\n]"
DEBUG 2013-01-15 21:00:59,120 (Thread-761) - << "HTTP/1.1 200 OK[\r][\n]"
DEBUG 2013-01-15 21:00:59,120 (Thread-761) - << "Content-Type: application/xml; 
charset=UTF-8[\r][\n]"
DEBUG 2013-01-15 21:00:59,120 (Thread-761) - << "Transfer-Encoding: 
chunked[\r][\n]"
DEBUG 2013-01-15 21:00:59,120 (Thread-761) - << "[\r][\n]"
DEBUG 2013-01-15 21:00:59,121 (Thread-761) - << "95[\r][\n]"
DEBUG 2013-01-15 21:00:59,121 (Thread-761) - << "<"
DEBUG 2013-01-15 21:00:59,121 (Thread-761) - << "?"
DEBUG 2013-01-15 21:00:59,121 (Thread-761) - << "x"
DEBUG 2013-01-15 21:00:59,121 (Thread-761) - << "ml version="1.0" encoding="U"
DEBUG 2013-01-15 21:00:59,121 (Thread-761) - << "TF-8"?>[\n]"
DEBUG 2013-01-15 21:00:59,121 (Thread-761) - << "<response>[\n]"
DEBUG 2013-01-15 21:00:59,121 (Thread-761) - << "<lst 
name="responseHeader"><int name="status">0</int><int 
name="QTime">471</int></lst>[\n]"
DEBUG 2013-01-15 21:00:59,121 (Thread-761) - << "</response>[\n]"
DEBUG 2013-01-15 21:00:59,121 (Thread-761) - << "[\r][\n]"
DEBUG 2013-01-15 21:00:59,121 (Thread-761) - << "0[\r][\n]"
DEBUG 2013-01-15 21:00:59,121 (Thread-761) - << "[\r][\n]"
DEBUG 2013-01-15 21:00:59,137 (Thread-760) - << "HTTP/1.1 200 OK[\r][\n]"
DEBUG 2013-01-15 21:00:59,137 (Thread-760) - << "Content-Type: application/xml; 
charset=UTF-8[\r][\n]"
DEBUG 2013-01-15 21:00:59,137 (Thread-760) - << "Transfer-Encoding: 
chunked[\r][\n]"
DEBUG 2013-01-15 21:00:59,137 (Thread-760) - << "[\r][\n]"
DEBUG 2013-01-15 21:00:59,138 (Thread-760) - << "95[\r][\n]"
DEBUG 2013-01-15 21:00:59,138 (Thread-760) - << "<"
DEBUG 2013-01-15 21:00:59,138 (Thread-760) - << "?"
DEBUG 2013-01-15 21:00:59,138 (Thread-760) - << "x"
DEBUG 2013-01-15 21:00:59,138 (Thread-760) - << "ml version="1.0" encoding="U"
DEBUG 2013-01-15 21:00:59,138 (Thread-760) - << "TF-8"?>[\n]"
DEBUG 2013-01-15 21:00:59,138 (Thread-760) - << "<response>[\n]"
DEBUG 2013-01-15 21:00:59,138 (Thread-760) - << "<lst 
name="responseHeader"><int name="status">0</int><int 
name="QTime">497</int></lst>[\n]"
DEBUG 2013-01-15 21:00:59,138 (Thread-760) - << "</response>[\n]"
DEBUG 2013-01-15 21:00:59,138 (Thread-760) - << "[\r][\n]"
DEBUG 2013-01-15 21:00:59,138 (Thread-760) - << "0[\r][\n]"
DEBUG 2013-01-15 21:00:59,138 (Thread-760) - << "[\r][\n]"
DEBUG 2013-01-15 21:01:14,326 (Thread-855) - >> "POST /solr/update 
HTTP/1.1[\r][\n]"
DEBUG 2013-01-15 21:01:14,327 (Thread-855) - >> "Content-Charset: UTF-8[\r][\n]"
DEBUG 2013-01-15 21:01:14,327 (Thread-855) - >> "Content-Type: 
application/x-www-form-urlencoded; charset=UTF-8[\r][\n]"
DEBUG 2013-01-15 21:01:14,327 (Thread-855) - >> "User-Agent: 
Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0[\r][\n]"
DEBUG 2013-01-15 21:01:14,327 (Thread-855) - >> "Content-Length: 65[\r][\n]"
DEBUG 2013-01-15 21:01:14,327 (Thread-855) - >> "Host: localhost:8983[\r][\n]"
DEBUG 2013-01-15 21:01:14,327 (Thread-855) - >> "Connection: Keep-Alive[\r][\n]"
DEBUG 2013-01-15 21:01:14,327 (Thread-855) - >> "[\r][\n]"
DEBUG 2013-01-15 21:01:14,327 (Thread-855) - >> 
"commit=true&softCommit=false&waitSearcher=true&wt=xml&version=2.2"
DEBUG 2013-01-15 21:01:14,341 (Thread-855) - << "HTTP/1.1 200 OK[\r][\n]"
DEBUG 2013-01-15 21:01:14,341 (Thread-855) - << "Content-Type: application/xml; 
charset=UTF-8[\r][\n]"
DEBUG 2013-01-15 21:01:14,341 (Thread-855) - << "Transfer-Encoding: 
chunked[\r][\n]"
DEBUG 2013-01-15 21:01:14,341 (Thread-855) - << "[\r][\n]"
DEBUG 2013-01-15 21:01:14,341 (Thread-855) - << "94[\r][\n]"
DEBUG 2013-01-15 21:01:14,341 (Thread-855) - << "<"
DEBUG 2013-01-15 21:01:14,342 (Thread-855) - << "?"
DEBUG 2013-01-15 21:01:14,342 (Thread-855) - << "x"
DEBUG 2013-01-15 21:01:14,342 (Thread-855) - << "ml version="1.0" encoding="U"
DEBUG 2013-01-15 21:01:14,342 (Thread-855) - << "TF-8"?>[\n]"
DEBUG 2013-01-15 21:01:14,342 (Thread-855) - << "<response>[\n]"
DEBUG 2013-01-15 21:01:14,342 (Thread-855) - << "<lst 
name="responseHeader"><int name="status">0</int><int 
name="QTime">11</int></lst>[\n]"
DEBUG 2013-01-15 21:01:14,342 (Thread-855) - << "</response>[\n]"
DEBUG 2013-01-15 21:01:14,342 (Thread-855) - << "[\r][\n]"
DEBUG 2013-01-15 21:01:14,342 (Thread-855) - << "0[\r][\n]"
DEBUG 2013-01-15 21:01:14,342 (Thread-855) - << "[\r][\n]"
{code}

                
> The content of sjis file can't be extracted
> -------------------------------------------
>
>                 Key: CONNECTORS-613
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-613
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: File system connector, Lucene/SOLR connector
>    Affects Versions: ManifoldCF 1.0.1, ManifoldCF 1.1
>         Environment: Solr 4.x (not Solr 3.x)
>            Reporter: Shinichiro Abe
>             Fix For: ManifoldCF 1.1
>
>         Attachments: files.zip
>
>
> When posting sjis text file by using curl, the content can be extracted.
> {noformat}
> curl "http://localhost:8983/solr/update/extract?literal.id=1&commit=true"; -F 
> "[email protected]"
> {noformat} 
> But when posting this file by File system connector, it can't be extracted. 
> it results empty.
> It seems that the content of utf-8 text file can be extracted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to