[GitHub] [phoenix] alejandro-anadon opened a new pull request, #1458: Phoenix 6742 Add UUID type

GitBox Wed, 06 Jul 2022 02:37:11 -0700


alejandro-anadon opened a new pull request, #1458:
URL: https://github.com/apache/phoenix/pull/1458


   As described in the jira https://issues.apache.org/jira/browse/PHOENIX-6742 ,
   these are the changes I see necessary to implement the UUID type.
   The tests I made have worked correctly, but I have some doubts that should 
be resolved before doing the merge.
   
   The UUID type is added along with three functions:
   
   -STR_TO_UUID(string) -> UUID
   -UUID_TO_TO_STR(UUID) -> String
   -UUID_RAND() -> UUID
   
   The type is called UUID which will be mapped to a 16 bytes (more
   compact and more efficient than mapping to a string as: 
"f90b6b50-f1b5-459b-b4dd-d6da4ddb5655")
   
   The problem is that this type, the UUID , can contain 0x00 or 0xFF. 
Therefore, when you
   want to create an index with this type, due to the limitation that non-nulls 
cannot have a byte with  0x00 or 0xFF.
   non-nulls cannot have a fixed size in the primary key (in the index), you 
have to transform it to another type.
   
    them, when making index with UUID, it is shitched to another data type: 
UUID_INDEXABLE (very similar to VARBINARY).  But in addition, UUID_INDEXABLE 
cannot contain neither 0x00 nor 0xff in its bytes.
   
   Therefore, what we do with UUID_INDEXABLE , is to transform the UUID to a 
string of type 
   "f90b6b50-f1b5-459b-b4dd-d6da4ddb5655" in which there will be neither
   0x00 or 0xFF (and the length will always be 36 even if the class is defined 
as fixedWidth = false).
   
   I had developed an algorithm that compact UUIDs to 18 bytes without 0x00 or 
0xff, but it broke the order within HBase (it do not sort UUIDS properly in a 
"select uuid from dummy order by uuid"). I guess it's a big enough weakness to 
uncheck it; but if there was some algorithm to do it correctly it's as easy as 
implementing it in the org.apache.phoenix.util.UUIDUtil class.
   If you want to see the algorithm I am referring to (with its explanation) I 
can provide it.
   
   This data type, UUID_INDEXABLE , must be kept hidden from client programs; 
and if it can be
   it can be somehow forbidden (I haven't found a way) to use it in a direct 
DDL, it would be ideal.
   a direct DDL, it would be ideal.
   
   "UUID ARRAY" is also implemented.
   
   There is no need to implement a "UUID_INDEXABLE ARRAY" because I haven't
   found any use case in which it could be given; because it would only make 
sense in the primary key as the last value (being an array); and if that were 
the need, you have to use UUID ARRAY.
   
   
   I have some internal doubts:
   1)Is this logic correct (is it the one implemented)?
           assertTrue(PUUID.INSTANCE.isCoercibleTo(PUUIDIndexable.INSTANCE));
        assertFalse(PUUIDIndexable.INSTANCE.isCoercibleTo(PUUID.INSTANCE));
   
   or it should be
   
           assertTrue(PUUID.INSTANCE.isCoercibleTo(PUUIDIndexable.INSTANCE));
           assertTrue(PUUIDIndexable.INSTANCE.isCoercibleTo(PUUID.INSTANCE));
   
   2)In reference to the sqlType I have put:
    PUUID -> 2100
    PUUIDIndexable -> 2101
   
   should be other numbers?
   
   2)the ordinal, when inserting the UUID and the PUUIDIndexable I have had to
   "shift down" all the Array types. To do so, I created a list of costants in 
order to
   to facilitate the realization of this reordering now, and possible future 
changes of order.
   But for this I have had to make minimal changes in all the type classes.
   
   For example:
   ...
       private PVarchar() {
           super("VARCHAR", Types.VARCHAR, String.class, null,
   PDataType.ORDINAL_VARCHAR);
       }
   ...
   (this is the reason why there are modifications in the 48 types classes. But 
it only consists of changing the numerical value by the costant defined in 
PDataType)
   
   3) in org.apache.phoenix.jdbc.PhoenixPreparedStatement
      and in org.apache.phoenix.jdbc.PhoenixResultSet I have added a set
   and a get UUID respectively. The client would have the option to do:
   ...
               if (ps instanceof PhoenixPreparedStatement) {
                   PhoenixPreparedStatement phops =
   (PhoenixPreparedStatement) ps;
                   phops.setUUID(1, uuidRecord1PK); //Compiler error time
   compiler error time
               } else {
                   ps.setObject(1, uuidRecord1PK); //Runtime error detection
               }
                        ....
   
                        if (rs instanceof PhoenixResultSet) {
                   PhoenixResultSet phors = (PhoenixResultSet) rs;
                   UUID temp=phors.getUUID(1); //Compiler runtime error
   compiler time error
               } else {
                                UUID temp=(UUID)phors.getObject(1); //Runtime 
error detection
               }
   
   ....
   
   The possible drawback that I see is that since it is neither in
   java.sql.ResultSet nor in java.sqlPreparedStatement, it "breaks" the 
standard.
   standard.
   
   4) The test "org.apache.phoenix.expression.function.UUIDFunctionTest",
   test name "testUUIDFunctions" I've built it based on
   "org.apache.phoenix.expression.function.InstrFunctionTest" test name.
   "testInstrFunction". I have seen that they test with both SortOrder.ASC
   and SortOrder.DESC.  In the tests for UUID, if I use SortOrder.ASC it gives 
no problem
   no problems; but if I use SortOrder.DESC it gives me strange behaviors
   strange behaviors that I can't explain. I don't know if SortOrder.DESC makes
   sense in this case or if I'm missing something that I don't see.
   
   5) in general I don't know if I have done many or few tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [phoenix] alejandro-anadon opened a new pull request, #1458: Phoenix 6742 Add UUID type

Reply via email to