On Sat, Oct 25, 2008 at 12:46 AM, Ho-Sheng Hsiao <[EMAIL PROTECTED]> wrote: > > I don't know which record it is barfing on. Pulling a single record out: > > { > "unihan_version": "5.1.0", > "unihan": { > "kIRG_GSource":"HZ", > "kOtherNumeric":"7", > "kIRGHanyuDaZidian":"10004.020", > "kDefinition":"the original form for \u4e03 U+4E03", > "kCihaiT":"10.601", > "kPhonetic":"1635", > "kMandarin":"QI1", > "kCantonese":"cat1", > "kRSKangXi":"1.1", > "kHanYu":"10004.020", > "kRSUnicode":"1.1", > "kIRGKangXi":"0076.021"}, > "_id":"U+20001" > } > } > > Seems to work fine even with the bulk uploader. > > I'm going to attempt to insert the records one by one. Maybe I can find > out which record it is barfing on, maybe the json was invalid. It seems > to me though, that something is barfing on utf8 on bulk uploads over a > certain limit. > > If someone wants to try it out, I can supply the json file I used. Any > help is appreciated.
If you don't mind, I'll take a look at it. The error you showed sure looks like a utf8 error, but with such a big bulk upload it's hard to be sure. Perhaps you can put the Unihan-5.1.0.json file online somewhere, or if you have it boiled down to records that are causing the problem, singling those out would of course be helpful. Thanks, Chris -- Chris Anderson http://jchris.mfdz.com