Do you have a stack for that exception? Might be i the impalad logs. This will help identify where it goes wrong and may guide towards a fix/workaround.
On Mon, May 15, 2017 at 11:51 PM, John Russell <[email protected]> wrote: > Round 2 of diagnosis. The Chinese characters, e.g. 语句, come through fine > when I run the query interactively in impala-shell, but not in impala-shell > -q through a bash script. I tried bash idioms like: > > stty iutf8 > > export LC_CTYPE=C > export LANG=C > > export LC_CTYPE=zh_CN.utf8 > export LANG=zh_CN.utf8 > > to no avail. This is different from IMPALA-532 where the problem is due > to specifying a non-existent locale. > > Thanks, > John > > > On May 15, 2017, at 11:34 PM, John Russell <[email protected]> > wrote: > > > > I'm running some impala-shell queries against Parquet files with > user-entered strings that are causing character encoding problems. I get > Chinese characters coming through just fine in results. There must be some > more exotic or non-UTF8 characters somewhere in the input. The errors look > like the following (citing different positions, sometimes echoing a u'' > codepoint, always mentioning range(128)): > > > > Unknown Exception : 'ascii' codec can't encode characters in position > 875-876: ordinal not in range(128) > > Could not execute command: select int_col, string_col from report where > string_col like "%${var:component}%" limit 250 > > > > Unknown Exception : 'ascii' codec can't encode character u'\u4e0e' in > position 3698: ordinal not in range(128) > > Could not execute command: select int_col, string_col from report where > string_col like "%${var:component}%" limit 250 > > > > Is there a WHERE technique or string regularizer function I could use to > skip over strings containing unrecognizable characters? SET MAX_ERRORS=0 > and/or ABORT_ON_ERROR=0 in advance of the queries didn't help. If I reduce > the LIMIT to something very low, the queries tend to work -- they seem to > fail on the first instance encountered of any problematic character. The > impala-shell commands are being issued from a bash script. > ${var:component} is a Hadoop-related name like 'impala' or 'kafka'. > > > > Thanks, > > John > >
