alejandro-anadon opened a new pull request, #1458: URL: https://github.com/apache/phoenix/pull/1458
As described in the jira https://issues.apache.org/jira/browse/PHOENIX-6742 , these are the changes I see necessary to implement the UUID type. The tests I made have worked correctly, but I have some doubts that should be resolved before doing the merge. The UUID type is added along with three functions: -STR_TO_UUID(string) -> UUID -UUID_TO_TO_STR(UUID) -> String -UUID_RAND() -> UUID The type is called UUID which will be mapped to a 16 bytes (more compact and more efficient than mapping to a string as: "f90b6b50-f1b5-459b-b4dd-d6da4ddb5655") The problem is that this type, the UUID , can contain 0x00 or 0xFF. Therefore, when you want to create an index with this type, due to the limitation that non-nulls cannot have a byte with 0x00 or 0xFF. non-nulls cannot have a fixed size in the primary key (in the index), you have to transform it to another type. them, when making index with UUID, it is shitched to another data type: UUID_INDEXABLE (very similar to VARBINARY). But in addition, UUID_INDEXABLE cannot contain neither 0x00 nor 0xff in its bytes. Therefore, what we do with UUID_INDEXABLE , is to transform the UUID to a string of type "f90b6b50-f1b5-459b-b4dd-d6da4ddb5655" in which there will be neither 0x00 or 0xFF (and the length will always be 36 even if the class is defined as fixedWidth = false). I had developed an algorithm that compact UUIDs to 18 bytes without 0x00 or 0xff, but it broke the order within HBase (it do not sort UUIDS properly in a "select uuid from dummy order by uuid"). I guess it's a big enough weakness to uncheck it; but if there was some algorithm to do it correctly it's as easy as implementing it in the org.apache.phoenix.util.UUIDUtil class. If you want to see the algorithm I am referring to (with its explanation) I can provide it. This data type, UUID_INDEXABLE , must be kept hidden from client programs; and if it can be it can be somehow forbidden (I haven't found a way) to use it in a direct DDL, it would be ideal. a direct DDL, it would be ideal. "UUID ARRAY" is also implemented. There is no need to implement a "UUID_INDEXABLE ARRAY" because I haven't found any use case in which it could be given; because it would only make sense in the primary key as the last value (being an array); and if that were the need, you have to use UUID ARRAY. I have some internal doubts: 1)Is this logic correct (is it the one implemented)? assertTrue(PUUID.INSTANCE.isCoercibleTo(PUUIDIndexable.INSTANCE)); assertFalse(PUUIDIndexable.INSTANCE.isCoercibleTo(PUUID.INSTANCE)); or it should be assertTrue(PUUID.INSTANCE.isCoercibleTo(PUUIDIndexable.INSTANCE)); assertTrue(PUUIDIndexable.INSTANCE.isCoercibleTo(PUUID.INSTANCE)); 2)In reference to the sqlType I have put: PUUID -> 2100 PUUIDIndexable -> 2101 should be other numbers? 2)the ordinal, when inserting the UUID and the PUUIDIndexable I have had to "shift down" all the Array types. To do so, I created a list of costants in order to to facilitate the realization of this reordering now, and possible future changes of order. But for this I have had to make minimal changes in all the type classes. For example: ... private PVarchar() { super("VARCHAR", Types.VARCHAR, String.class, null, PDataType.ORDINAL_VARCHAR); } ... (this is the reason why there are modifications in the 48 types classes. But it only consists of changing the numerical value by the costant defined in PDataType) 3) in org.apache.phoenix.jdbc.PhoenixPreparedStatement and in org.apache.phoenix.jdbc.PhoenixResultSet I have added a set and a get UUID respectively. The client would have the option to do: ... if (ps instanceof PhoenixPreparedStatement) { PhoenixPreparedStatement phops = (PhoenixPreparedStatement) ps; phops.setUUID(1, uuidRecord1PK); //Compiler error time compiler error time } else { ps.setObject(1, uuidRecord1PK); //Runtime error detection } .... if (rs instanceof PhoenixResultSet) { PhoenixResultSet phors = (PhoenixResultSet) rs; UUID temp=phors.getUUID(1); //Compiler runtime error compiler time error } else { UUID temp=(UUID)phors.getObject(1); //Runtime error detection } .... The possible drawback that I see is that since it is neither in java.sql.ResultSet nor in java.sqlPreparedStatement, it "breaks" the standard. standard. 4) The test "org.apache.phoenix.expression.function.UUIDFunctionTest", test name "testUUIDFunctions" I've built it based on "org.apache.phoenix.expression.function.InstrFunctionTest" test name. "testInstrFunction". I have seen that they test with both SortOrder.ASC and SortOrder.DESC. In the tests for UUID, if I use SortOrder.ASC it gives no problem no problems; but if I use SortOrder.DESC it gives me strange behaviors strange behaviors that I can't explain. I don't know if SortOrder.DESC makes sense in this case or if I'm missing something that I don't see. 5) in general I don't know if I have done many or few tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
