Hello,

I am using apache tika. Its really better choice.
But, I need your help for word counting. I used follwing command for
getting WORD-COUNT from METADATA

    input -: java -jar tika_cmd.jar --metadata XXX.doc

    output -:

Application-Name: Microsoft Office Word
Author: XXX
Character Count: 10329
Company:
Content-Length: 47616
Content-Type: application/msword
Creation-Date: 2012-08-01T14:34:00Z
Edit-Time: 600000000
Last-Modified: 2012-08-01T14:34:00Z
Last-Printed: 2012-08-01T14:32:00Z
Last-Save-Date: 2012-08-01T14:34:00Z
Page-Count: 6
Revision-Number: 2
Template: Normal.dotm
Word-Count: 1812
cp:revision: 2
creator: xXX
date: 2012-08-01T14:34:00Z
dc:creator: XXX
dc:title: MUTUAL CONFIDENTIALITY AGREEMENT
dcterms:created: 2012-08-01T14:34:00Z
dcterms:modified: 2012-08-01T14:34:00Z
extended-properties:Application: Microsoft Office Word
extended-properties:Company:
extended-properties:Template: Normal.dotm
meta:author: XXX
meta:character-count: 10329
meta:creation-date: 2012-08-01T14:34:00Z
meta:last-author: Roxanne Potgieter
meta:page-count: 6
meta:print-date: 2012-08-01T14:32:00Z
meta:save-date: 2012-08-01T14:34:00Z
meta:word-count: 1812
modified: 2012-08-01T14:34:00Z
resourceName: Confidentiality Agreement.doc
title: MUTUAL CONFIDENTIALITY AGREEMENT
xmpTPg:NPages: 6

Now I am using same command for other documents which is created in
Openoffice or Libreoffice and save it as doc, docx, xls, xlsx, ppt, pptx.
So I am not getting WORD-COUNT

      input -: java -jar tika_cmd.jar --metadata XXX.doc      ( XXX.doc is
file which is created in openoffice or libreoffice)

      output -:

Application-Name: Microsoft Excel
Application-Version: 12.0000
Author: XXX
Content-Length: 15986
Content-Type:
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Creation-Date: 2013-01-30T16:15:54Z
Last-Modified: 2013-02-05T14:13:31Z
Last-Save-Date: 2013-02-05T14:13:31Z
creator: XXX
date: 2013-01-30T16:15:54Z
dc:creator: XXX
dc:publisher: XXX
dcterms:created: 2013-01-30T16:15:54Z
dcterms:modified: 2013-02-05T14:13:31Z
extended-properties:AppVersion: 12.0000
extended-properties:Application: Microsoft Excel
extended-properties:Company: XXX
meta:author: XXX
meta:creation-date: 2013-01-30T16:15:54Z
meta:last-author: XXX
meta:save-date: 2013-02-05T14:13:31Z
modified: 2013-02-05T14:13:31Z
protected: false
publisher: leosys
resourceName: XXX

      Please, suggest me why I am not getting WORD-COUNT

---------- Forwarded message ----------
From: nilesh gorle <[email protected]>
Date: 13 February 2013 11:38
Subject: Query On Apache Tika
To: [email protected]


Hello,

I am using apache tika. Its really better choice.
But, I need your help for word counting. I used follwing command for
getting WORD-COUNT from METADATA

    input -: java -jar tika_cmd.jar --metadata XXX.doc

    output -:

Application-Name: Microsoft Office Word
Author: XXX
 Character Count: 10329
Company:
Content-Length: 47616
 Content-Type: application/msword
Creation-Date: 2012-08-01T14:34:00Z
 Edit-Time: 600000000
Last-Modified: 2012-08-01T14:34:00Z
Last-Printed: 2012-08-01T14:32:00Z
 Last-Save-Date: 2012-08-01T14:34:00Z
Page-Count: 6
Revision-Number: 2
 Template: Normal.dotm
Word-Count: 1812
cp:revision: 2
 creator: xXX
date: 2012-08-01T14:34:00Z
dc:creator: XXX
 dc:title: MUTUAL CONFIDENTIALITY AGREEMENT
dcterms:created: 2012-08-01T14:34:00Z
 dcterms:modified: 2012-08-01T14:34:00Z
extended-properties:Application: Microsoft Office Word
 extended-properties:Company:
extended-properties:Template: Normal.dotm
 meta:author: XXX
meta:character-count: 10329
meta:creation-date: 2012-08-01T14:34:00Z
 meta:last-author: Roxanne Potgieter
meta:page-count: 6
meta:print-date: 2012-08-01T14:32:00Z
 meta:save-date: 2012-08-01T14:34:00Z
meta:word-count: 1812
modified: 2012-08-01T14:34:00Z
 resourceName: Confidentiality Agreement.doc
title: MUTUAL CONFIDENTIALITY AGREEMENT
 xmpTPg:NPages: 6

Now I am using same command for other documents which is created in
Openoffice or Libreoffice and save it as doc, docx, xls, xlsx, ppt, pptx.
So I am not getting WORD-COUNT

      input -: java -jar tika_cmd.jar --metadata XXX.doc      ( XXX.doc is
file which is created in openoffice or libreoffice)

      output -:

Application-Name: Microsoft Excel
 Application-Version: 12.0000
Author: XXX
Content-Length: 15986
 Content-Type:
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Creation-Date: 2013-01-30T16:15:54Z
 Last-Modified: 2013-02-05T14:13:31Z
Last-Save-Date: 2013-02-05T14:13:31Z
 creator: XXX
date: 2013-01-30T16:15:54Z
dc:creator: XXX
 dc:publisher: XXX
dcterms:created: 2013-01-30T16:15:54Z
dcterms:modified: 2013-02-05T14:13:31Z
 extended-properties:AppVersion: 12.0000
extended-properties:Application: Microsoft Excel
 extended-properties:Company: XXX
meta:author: XXX
meta:creation-date: 2013-01-30T16:15:54Z
 meta:last-author: XXX
meta:save-date: 2013-02-05T14:13:31Z
modified: 2013-02-05T14:13:31Z
 protected: false
publisher: leosys
resourceName: XXX

      Please, suggest me why I am not getting WORD-COUNT

-- 
Thanks & Regards -:

Nilesh G.
[email protected]




-- 
Thanks & Regards -:

Nilesh G.
[email protected]
9970056516

Reply via email to