無事にVMware上のCentOSでnamazuが動いたので本番をと設定したのですが、mknmzで
Unable to convert pdf fileと。
 pdftotext -q -raw -enc EUC -opw password pdffile
と
 pdftotext -cfg /usr/local/etc/xpdfrc -enc EUC -opw password pdffile
では、いずれでもpdfファイルからテキストが抽出されます。
 mknmz -Cの結果は
Loaded rcfile: /usr/local/etc/namazu/mknmzrc
System: linux
Namazu: 2.0.21
Perl: 5.010001
File-MMagic: 1.27
NKF: /usr/bin/nkf
KAKASI: /usr/local/bin/kakasi -ieuc -oeuc -w
ChaSen: no
MeCab: no
Wakati: /usr/local/bin/kakasi -ieuc -oeuc -w
Lang_Msg: en_US.UTF-8
Lang: en_US.UTF-8
Coding System: euc
CONFDIR: /usr/local/etc/namazu
LIBDIR: /usr/local/share/namazu/pl
FILTERDIR: /usr/local/share/namazu/filter
TEMPLATEDIR: /usr/local/share/namazu/template
Supported media types:   (37)
Unsupported media types: (11) marked with minus (-) probably missing applicati
on in your $path.
- application/excel: excel.pl
  application/gnumeric: gnumeric.pl
  application/ichitaro5: taro56.pl
  application/ichitaro6: taro56.pl
- application/ichitaro7: taro7_10.pl
  application/macbinary: macbinary.pl
- application/msword: msword.pl
  application/pdf: pdf.pl
- application/postscript: postscript.pl
- application/powerpoint: powerpoint.pl
- application/rtf: rtf.pl
  application/vnd.kde.kivio: koffice.pl
  application/vnd.kde.kpresenter: koffice.pl
  application/vnd.kde.kspread: koffice.pl
  application/vnd.kde.kword: koffice.pl
  application/vnd.oasis.opendocument.graphics: ooo.pl
  application/vnd.oasis.opendocument.presentation: ooo.pl
  application/vnd.oasis.opendocument.spreadsheet: ooo.pl
  application/vnd.oasis.opendocument.text: ooo.pl
  application/vnd.openxmlformats-officedocument.presentationml: msofficexml.pl
  application/vnd.openxmlformats-officedocument.spreadsheetml: msofficexml.pl
  application/vnd.openxmlformats-officedocument.wordprocessingml: msofficexml.
pl
  application/vnd.sun.xml.calc: ooo.pl
  application/vnd.sun.xml.draw: ooo.pl
  application/vnd.sun.xml.impress: ooo.pl
  application/vnd.sun.xml.writer: ooo.pl
  application/vnd.visio: visio.pl
  application/x-apache-cache: apachecache.pl
  application/x-bzip2: bzip2.pl
  application/x-compress: compress.pl
- application/x-deb: deb.pl
- application/x-dvi: dvi.pl
  application/x-gzip: gzip.pl
- application/x-js-taro: taro7_10.pl
  application/x-rpm: rpm.pl
- application/x-tex: tex.pl
  application/x-zip: zip.pl
- audio/mpeg: mp3.pl
  message/news: mailnews.pl
  message/rfc822: mailnews.pl
  text/hnf: hnf.pl
  text/html: html.pl
  text/html; x-type=mhonarc: mhonarc.pl
  text/html; x-type=pipermail: pipermail.pl
  text/plain
  text/plain; x-type=rfc: rfc.pl
  text/x-hdml: hdml.pl
  text/x-roff: man.pl
  
 pdftotextの結果は
pdftotext version 3.03
Copyright 1996-2011 Glyph & Cog, LLC
Usage: pdftotext [options] <PDF-file> [<text-file>]
  -f <int>          : first page to convert
  -l <int>          : last page to convert
  -layout           : maintain original physical layout
  -fixed <fp>       : assume fixed-pitch (or tabular) text
  -raw              : keep strings in content stream order
  -htmlmeta         : generate a simple HTML file, including the meta informat
ion
  -enc <string>     : output text encoding name
  -eol <string>     : output end-of-line convention (unix, dos, or mac)
  -nopgbrk          : don't insert page breaks between pages
  -opw <string>     : owner password (for encrypted files)
  -upw <string>     : user password (for encrypted files)
  -q                : don't print any messages or errors
  -cfg <string>     : configuration file to use in place of .xpdfrc
  -v                : print copyright and version info
  -h                : print usage information
  -help             : print usage information
  --help            : print usage information
  -?                : print usage information
 pdfinfoの結果は
pdfinfo version 3.03
Copyright 1996-2011 Glyph & Cog, LLC
Usage: pdfinfo [options] <PDF-file>
  -f <int>          : first page to convert
  -l <int>          : last page to convert
  -box              : print the page bounding boxes
  -meta             : print the document metadata (XML)
  -rawdates         : print the undecoded date strings directly from the PDF f
ile
  -enc <string>     : output text encoding name
  -opw <string>     : owner password (for encrypted files)
  -upw <string>     : user password (for encrypted files)
  -cfg <string>     : configuration file to use in place of .xpdfrc
  -v                : print copyright and version info
  -h                : print usage information
  -help             : print usage information
  --help            : print usage information
  -?                : print usage information
   確認のためにパスワードを付けていないpdfを検索対象ディレクトリに
  コピーするとこれは正常にインデックスを作ってくれます。
   思い余って成功しているVMware上のpdf.plと失敗している実機のpdf.plを
  Win7に転送し、FCコマンドで比較しても差異が認められませんでした。
   何を疑えばよいでしょうか。
  

_______________________________________________
Namazu-users-ja mailing list
Namazu-users-ja@namazu.org
http://www.namazu.org/cgi-bin/mailman/listinfo/namazu-users-ja

メールによる返信