Hi, I am trying to use the fts_solr plugin, and having some success. Unfortunately some spam messages I had lying around generate an error from Solr, e.g.:
HTTP/1.1 400 ParseError at [row,col]:[5,29] Message: An invalid XML character
(Unicode: 0xd84e) was found in the element content of the document.
While I assume that these messages do indeed contain bad unicode, my
searches seem to hang when they get an error back from Solr, causing
other problems, so it would be nice if these messages did not cause an
error from Solr.
I isolated some of these messages and have attached them. I hope that
the problematic characters come through properly. I have more messages
that cause this problem (sometimes with difference unicode codepoints
than the one in the message above), but I don’t want to clog the list
up.
Thank you for your help.
best, Erik Hetzner
Config:
# 1.2.13: /etc/dovecot/dovecot.conf
# OS: Linux 2.6.32-23-generic i686 Ubuntu 10.04.1 LTS
log_timestamp: %Y-%m-%d %H:%M:%S
protocols: imaps
login_dir: /usr/local/stow/dovecot-1.2.13/var/run/dovecot/login
login_executable: /usr/local/stow/dovecot-1.2.13/libexec/dovecot/imap-login
mail_privileged_group: mail
mail_plugins: virtual fts fts_solr
namespace:
type: private
separator: /
location: maildir:~/Maildir
inbox: yes
list: yes
subscriptions: yes
namespace:
type: private
separator: /
prefix: virtual/
location: virtual:~/Maildir_virtual:LAYOUT=maildir++
list: yes
subscriptions: yes
auth default:
passdb:
driver: pam
userdb:
driver: passwd
plugin:
fts: solr
fts_solr: url=http://localhost:8080/solr/ break-imap-search debug
--- Begin Message ---ËR¸çµÜ·B ^AVïÐÌÓCÒðµÄ¢Ü·ÛRÆ\µÜ·B ±ÌxÍlès«Ì½ßÉ}ç¯åWµÄ¨èÜ·B ܸÍÈPÈIlðl¦Ä¨èÜ·B ½¿ª¢ÔêÊÉÎêÈ¢½ßÌJt[W Å éTCgð¡ØèĢܷB »±ÉÍenæÉ¢éAVìÌBªo^µÄ¢Ü·B »±©çj^[Æ¢¤`Åzª³ê½AVÌBÆZbNXðµÄ¢½¾«Ü·B »µÄÞ½¿É[ðüêÄàç¤`Å»±©çIñÅ¢«½¢ÆvÁĢܷB ¥ñV¶Á¡éðÚwµÄæ£Áľ³¢B úâè½¢»Ì«~ªê¶Ì¨ÉÈéÂ\«Í¢çÅà èÜ·B http://marriage-news.net/?gz14
--- End Message ---
--- Begin Message ---˽¤ÏÈËÆÞ?ÄæÔ®Öú¥µ¥¤¥È¤Î¹ÜÀíÈˤò¤·¤Æ¤¤¤ë¸ßɼ¤ÈÉꤷ¤Þ¤¹¡£ ²»Üz¤Ç·Ç³£¤ËʧÀñ¤ÊÙ|¤ò¤µ¤»¤Æí¤¤Þ¤¹¡£ ½ñ¤ª½ð¤ËÀ§¤Ã¤Æ¤¤¤Þ¤»¤ó¤«?? À§¤Ã¤Æ¤¤¤Ê¤¤·½²»ÐŸФò¤¤¤À¤¤¤Æ¤¤¤ë¤«¤¿¤ÏºÎ¤â¤·¤Æ¤¤¤¿¤À¤«¤Ê¤¯¤Æ¤âȫȻ´óÕÉ·ò¤Ç¤¹¡£ ¤â¤·À§¤Ã¤Æ¤¤¤ë¤«¤¿¤¬¤¤¤¿¤éÊǷǵ±¥µ¥¤¥È¤ò¤ªÊ¹¤¤¤¤¤¿¤À¤±¤Þ¤»¤ó¤«?? ½ñÄÐÐÔ»áT¤ÎÈËÊý¤¬´óä²»×㤷¤Æ¤¤¤ÆÀ§¤Ã¤Æ¤ª¤ê¤Þ¤¹¡£ ÊǷǽñ¤³¤ÎrÆÚ¤ËµÇåh¤·¤Æ¤¤¤¿¤À¤¤¤Æ¤¢¤Ê¤¿¤À¤±¤Î¤ªÏàÊÖ¤ò×÷¤Ã¤Æ¤ß¤Æ¤Ï¤¤¤«¤¬¤Ç¤·¤ç¤¦¤«?? ¤ª´ÖÄ©¤Ç¤¹¤¬¡¢¤â¤·¤è¤±¤ì¤Ð¤è¤í¤·¤¯¤ªî¤¤¤·¤Þ¤¹¡£ PC¤Î·½¤Ï¤³¤Á¤é¤«¤é¤É¤¦¤¾¡ý http://tsuma2.net/pc/main.php?rr5smp06 Я¡¤Î·½¤Ï¤³¤Á¤é¤«¤é¤É¤¦¤¾¡ý http://tsuma2.net/mobile/?rr5smm04 ¡ùµ±¥µ¥¤¥È¤Ï¥¢¥¯¥»¥¹áᡢ϶Τˤ´¤¶¤¤¤Þ¤¹¡¢ÓÐÁÏ¥¹¥Ý¥ó¥µ©`¥µ¥¤¥È¤«¤é¤ÎÚ¸æÙMµÈ¤Ë¤Æß\Ó¤·¤Æ¤ª¤ê¤Þ¤¹¤Î¤Ç¡¢µ±¥µ¥¤¥È¤ÏÍêÈ«oÁϤˤƤ´ÀûÓÃí¤±¤Þ¤¹¡£ ÅäОܷñ¤Ï¤³¤Á¤é [email protected]
--- End Message ---
--- Begin Message ---×î½ü¤Ç¤ÏÅ®ÐÔ¤¬ÄФòÙI¤Ã¤Æ¤¤¤ë¤Ã¤ÆÖª¤Ã¤Æ¤Þ¤¹¤«£¿ Ö÷¤ËÊìÅ®¤È¤«¤ª½ð¤Ï¤¢¤ë¤Î¤Ëʹ¤¤µÀ¤Î¤Ê¤¤¼Å¤·¤¤Å®ÐÔ¤¬¶à¤¤¤ß¤¿¤¤¤À¤±¤É¡¢¤¿¤Þ¤Ë½Ö¤ÇÒ¤«¤±¤Þ¤»¤ó¤«£¿ Èô¤¤ÄФÎ×Ó¤¬¤ªÄ¸¤µ¤ó¤¯¤é¤¤¤ÎÈˤÈÍó¤ò½M¤ó¤Çi¤¤¤Æ¤ë¤Î¤ò£¡ ÖФˤϤʤó¤Ç£¿¤Ã¤Æ¤¯¤é¤¤ÃÀÈˤ⤤¤ë¤«¤é¤Ó¤Ã¤¯¤ê¤Ç¤¹¤è¤Í! ¤â¤Á¤í¤óÈâÌåévS¤òÇó¤á¤Æ¤ë¤Ò¤È¤â¤¤¤ì¤Ð¡¢Ï¢×ӤΤ˽Ӥ·¤ÆÊ³Ê¤ò¤¹¤ë¤À¤±¤ÎÈˤ⤤¤ë¤ß¤¿¤¤? ¤½¤ì¤Ç¤ªÐ¡Ç²¤¤¤¬ÙB¤¨¤Á¤ã¤¦¤ó¤À¤«¤é¡¢µÖ¿¹¤Î¤Ê¤¤ÈˤˤϤâ¤Ã¤Æ¤³¤¤¤À¤è¤Í¤Ã£¡ ¤³¤Î¥µ¥¤¥È¤Ï¤½¤¦¤æ¤¦¤ÎéT¤Ç¤ä¤Ã¤Æ¤ë¤ó¤À¤è¡î ×î½ü³ö»á¤¤Ïµ¥µ¥¤¥È¤¬¤Ï¤ä¤Ã¤Æ¤Æ¥µ¥¯¥é¤òʹ¤Ã¤¿¥µ¥¤¥È¤¬¶à¤¤¤±¤É¡¢ ¤³¤³¤ÏÎô¤«¤é¤º¤Ã¤ÈÄæ£¤éT¤Ç¤ä¤Ã¤Æ¤ë¤·¡¢ ¹ÜÀí¤¬¤·¤Ã¤«¤ê¤·¤Æ¤ë¤«¤é¥µ¥¯¥é¤ÈÅжϤµ¤ì¤¿¤é¤¹¤°ÇФé¤ì¤Á¤ã¤¦¤«¤é±¾µ±¤Ë°²ÐĤÀ¤è£¡£¡ £Ð£Ã£ºhttp://dice3.lir.dk/yu/?zh32 Я¡£ºhttp://dice3.lir.dk/secret/ ¤·¤¯¤·¤Æ¤¢¤²¤ì¤Ð£¤¤â¤Ï¤º¤à¤«¤â¤Í¤Ã?? ÅäОܷñ¤Ï¤³¤Á¤é [email protected]
--- End Message ---
pgpHDcoWrV1JU.pgp
Description: PGP signature
