[
https://issues.apache.org/jira/browse/PHOENIX-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392505#comment-15392505
]
James Taylor commented on PHOENIX-3112:
---------------------------------------
Thanks for unit test, [~pierre.lacave]. The root cause is the large VARBINARY
value which is tripping the partial row scanning logic in HBase which Phoenix
does not handle correctly. Phoenix needs to be able to load an entire row's
worth of data. You can override the client-side
{{hbase.client.scanner.max.result.size}} config parameter to be larger than the
default value to prevent this issue.
Adding the following to your unit test causes it to pass:
{code}
@BeforeClass
@Shadower(classBeingShadowed = BaseHBaseManagedTimeIT.class)
public static void doSetup() throws Exception {
Map<String, String> clientProps = Maps.newHashMapWithExpectedSize(1);
clientProps.put(HConstants.HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE_KEY,
Long.toString(Long.MAX_VALUE));
setUpTestDriver(new ReadOnlyProps(clientProps.entrySet().iterator()));
}
{code}
It's possible that Phoenix could support the transfer of partial rows to the
client and combine them there, but I'm not sure this would buy us much as in
the end as JDBC need the entire row to be in memory.
> Rows randomly missing values with large select
> ----------------------------------------------
>
> Key: PHOENIX-3112
> URL: https://issues.apache.org/jira/browse/PHOENIX-3112
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.7.0
> Reporter: Pierre Lacave
>
> When doing a select of a relatively large table (a few touthands rows) some
> rows return partially missing.
> When increasing the fitler to return those specific rows, the values appear
> as expected
> {noformat}
> CREATE TABLE IF NOT EXISTS TEST (
> BUCKET VARCHAR,
> TIMESTAMP_DATE TIMESTAMP,
> TIMESTAMP UNSIGNED_LONG NOT NULL,
> SRC VARCHAR,
> DST VARCHAR,
> ID VARCHAR,
> ION VARCHAR,
> IC BOOLEAN NOT NULL,
> MI UNSIGNED_LONG,
> AV UNSIGNED_LONG,
> MA UNSIGNED_LONG,
> CNT UNSIGNED_LONG,
> DUMMY VARCHAR
> CONSTRAINT pk PRIMARY KEY (BUCKET, TIMESTAMP DESC, SRC, DST, ID, ION, IC)
> );{noformat}
> using a python script to generate a CSV with 5000 rows
> {noformat}
> for i in xrange(5000):
> print "5SEC,2016-07-21
> 07:25:35.{i},146908593500{i},WWWWWWWW,AAA,BBBB,CCCCCCCC,false,{i}1181000,1788000{i},2497001{i},{i},aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa{i}".format(i=i)
> {noformat}
> bulk inserting the csv in the table
> {noformat}
> phoenix/bin/psql.py localhost -t TEST large.csv
> {noformat}
> here we can see one row that contains no TIMESTAMP_DATE and null values in MI
> and MA
> {noformat}
> 0: jdbc:phoenix:localhost:2181> select * from TEST
> ....
> +---------+--------------------------+-------------------+-----------+------+-------+-----------+--------+--------------+--------------+--------------+-------+----------------------------------------------------------------------------+
> | BUCKET | TIMESTAMP_DATE | TIMESTAMP | SRC | DST |
> ID | ION | IC | MI | AV | MA |
> CNT | DUMMY
> |
> +---------+--------------------------+-------------------+-----------+------+-------+-----------+--------+--------------+--------------+--------------+-------+----------------------------------------------------------------------------+
> | 5SEC | 2016-07-21 07:25:35.100 | 1469085935001000 | WWWWWWWW | AAA |
> BBBB | CCCCCCCC | false | 10001181000 | 17880001000 | 24970011000 |
> 1000 |
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1000 |
> | 5SEC | 2016-07-21 07:25:35.999 | 146908593500999 | WWWWWWWW | AAA |
> BBBB | CCCCCCCC | false | 9991181000 | 1788000999 | 2497001999 | 999
> | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa999
> |
> | 5SEC | 2016-07-21 07:25:35.998 | 146908593500998 | WWWWWWWW | AAA |
> BBBB | CCCCCCCC | false | 9981181000 | 1788000998 | 2497001998 | 998
> | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa998
> |
> | 5SEC | | 146908593500997 | WWWWWWWW | AAA |
> BBBB | CCCCCCCC | false | null | 1788000997 | null | 997
> |
> |
> | 5SEC | 2016-07-21 07:25:35.996 | 146908593500996 | WWWWWWWW | AAA |
> BBBB | CCCCCCCC | false | 9961181000 | 1788000996 | 2497001996 | 996
> | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa996
> |
> | 5SEC | 2016-07-21 07:25:35.995 | 146908593500995 | WWWWWWWW | AAA |
> BBBB | CCCCCCCC | false | 9951181000 | 1788000995 | 2497001995 | 995
> | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa995
> |
> | 5SEC | 2016-07-21 07:25:35.994 | 146908593500994 | WWWWWWWW | AAA |
> BBBB | CCCCCCCC | false | 9941181000 | 1788000994 | 2497001994 | 994
> | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa994
> |
> ....
> {noformat}
> but when selecting that row specifically the values are correct
> {noformat}
> 0: jdbc:phoenix:localhost:2181> select * from TEST where timestamp =
> 146908593500997;
> +---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
> | BUCKET | TIMESTAMP_DATE | TIMESTAMP | SRC | DST |
> ID | ION | IC | MI | AV | MA | CNT |
> DUMMY |
> +---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
> | 5SEC | 2016-07-21 07:25:35.997 | 146908593500997 | WWWWWWWW | AAA |
> BBBB | CCCCCCCC | false | 9971181000 | 1788000997 | 2497001997 | 997 |
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa997 |
> +---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
> 1 row selected (0.159 seconds){noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)