[jira] [Created] (CONNECTORS-1006) Google native documents are not crawled

Shigeki Kobayashi (JIRA) Mon, 11 Aug 2014 03:15:29 -0700

Shigeki Kobayashi created CONNECTORS-1006:
---------------------------------------------


             Summary: Google native documents are not crawled
                 Key: CONNECTORS-1006
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1006
             Project: ManifoldCF
          Issue Type: Bug
          Components: GoogleDrive connector
    Affects Versions: ManifoldCF 1.4.1
            Reporter: Shigeki Kobayashi


I use MCF 1.4.1 and try to crawl google native documents such as spreadsheet 
then index to solr.
It seems that MCF would not extract the contents. Maybe MCF would not export  
spreadsheet to PDF.

The Simple History tells the result of crawl is "NO LENGTH".
 
The documents are saved as Google Spreadsheet in Google Docs, which are also 
managed in Google Drive.

As MCF documentation says "native Google documents such as spreadsheets and 
word documents are exported to PDF and then ingested", those Google 
Spreadsheets should be crawled and indexed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CONNECTORS-1006) Google native documents are not crawled

Reply via email to