Testing this using elasticsearch-php 5.x and elasticsearch 5.x I was
able to index things called 'João Jiménez Māori' and search them up
again.

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Incomplete

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~mahara-contributors
Post to     : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp

Reply via email to