Yes, this needs to be tested and confirmed. I will work on it. Would be great to get more details about indexes. I'm not sure I understand the limitation there.
-Val On Mon, Jul 3, 2017 at 7:21 AM, Dmitriy Setrakyan <[email protected]> wrote: > Agree with Valya on the system-wide default. We need to have it. > > Also, are we certain that the encoding will provide 1-byte length for UTF-8 > for different languages? Would be nice to test it to confirm, as it has a > potential to decrease the Ignite storage space by 2x in certain cases. > > D. > > On Sun, Jul 2, 2017 at 12:26 PM, Valentin Kulichenko < > [email protected]> wrote: > > > Vova, > > > > That's actually a good point. Probably that would be enough and there is > no > > need to introduce absract encoder. However, I still think it makes sense > to > > specify default encoding in BinaryConfiguration and > > BinaryTypeConfiguration. > > > > -Val > > > > On Sun, Jul 2, 2017 at 10:31 AM Vladimir Ozerov <[email protected]> > > wrote: > > > > > Yes, this is exactly what non-UTF8 encodings do. > > > > > > вс, 2 июля 2017 г. в 20:08, Dmitriy Setrakyan <[email protected]>: > > > > > > > On Sun, Jul 2, 2017 at 9:50 AM, Vladimir Ozerov < > [email protected]> > > > > wrote: > > > > > > > > > There is no need for custom encoders, as they are already built-in > to > > > > Java. > > > > > > > > > > > > > Will non-ASCII encodings fit into 1 byte? The whole point here is to > > save > > > > space. > > > > > > > > > > > > > > > > > > вс, 2 июля 2017 г. в 19:16, Dmitriy Setrakyan < > [email protected] > > >: > > > > > > > > > > > Vladimir, how would you plugin custom encoders in your design? > > > > > > > > > > > > On Sat, Jul 1, 2017 at 11:53 PM, Vladimir Ozerov < > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > > > Valya, > > > > > > > > > > > > > > Personally I vote against this feature. BinaryConfiguration is > > > proven > > > > > to > > > > > > be > > > > > > > inconvenient, since it has to be configured before node start, > it > > > > > cannot > > > > > > be > > > > > > > changed in runtime, and it requires classes on the server. > > > Moreover, > > > > if > > > > > > you > > > > > > > decide to change encoding at some point, it would be > impossible. > > > > > > > > > > > > > > I think, we should add this feature on API level instead. If > > string > > > > is > > > > > > > written in non-UTF8 form, we will write in different format: > > > > > > > [encoding_code][string] > > > > > > > > > > > > > > BInaryWriter.writeString(String fieldName, String val); > > > > > > > BInaryWriter.writeString(String fieldName, String val, *String > > > > > > encoding*); > > > > > > > > > > > > > > BinaryReader.readString(String fieldName); > > > > > > > BinaryReader.readString(String fieldName, *String encoding*); > > > > > > > > > > > > > > BinaryObjectBuilder.writeString(String fieldName, String val, > > > *String > > > > > > > encoding*); > > > > > > > > > > > > > > class MyClass { > > > > > > > *@BinaryString(encoding = "Cp1251")* > > > > > > > private String myCyrillicString; > > > > > > > } > > > > > > > > > > > > > > Vladimir. > > > > > > > > > > > > > > On Sat, Jul 1, 2017 at 7:26 PM, Dmitriy Setrakyan < > > > > > [email protected] > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > On Sat, Jul 1, 2017 at 2:24 AM, Sergi Vladykin < > > > > > > [email protected] > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > In SQL indexes we may store partial strings and assume them > > to > > > be > > > > > in > > > > > > > > UTF-8, > > > > > > > > > I don't think this can be abstracted away. But may be this > is > > > > not a > > > > > > big > > > > > > > > > deal if in indexes we still will use UTF-8. > > > > > > > > > > > > > > > > > > > > > > > > > Sergi, why does it matter if it is UTF8 or custom encoding? > Why > > > > can't > > > > > > we > > > > > > > > use our own compact encoding in indexes? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2017-07-01 10:13 GMT+03:00 Dmitriy Setrakyan < > > > > > [email protected] > > > > > > >: > > > > > > > > > > > > > > > > > > > Val, do you know how we compare strings in SQL queries? > > Will > > > we > > > > > be > > > > > > > able > > > > > > > > > to > > > > > > > > > > use this encoder? > > > > > > > > > > > > > > > > > > > > Additionally, I think that the encoder is a bit too > > abstract. > > > > Why > > > > > > not > > > > > > > > go > > > > > > > > > > even further and allow users create their own ASCII table > > for > > > > > > > encoding? > > > > > > > > > > > > > > > > > > > > D. > > > > > > > > > > > > > > > > > > > > On Fri, Jun 30, 2017 at 6:49 PM, Valentin Kulichenko < > > > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > > > Andrey, > > > > > > > > > > > > > > > > > > > > > > Can you elaborate more on this? What is your concern? > > > > > > > > > > > > > > > > > > > > > > -Val > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 30, 2017 at 6:17 PM Andrey Mashenkov < > > > > > > > > > > > [email protected]> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Val, > > > > > > > > > > > > > > > > > > > > > > > > Looks like make sense. > > > > > > > > > > > > > > > > > > > > > > > > This will not affect FullText index, as Lucene has > own > > > > format > > > > > > for > > > > > > > > > > storing > > > > > > > > > > > > data. > > > > > > > > > > > > > > > > > > > > > > > > But.. would it be compatible with H2 indexing ? I > > doubt. > > > > > > > > > > > > > > > > > > > > > > > > 1 июля 2017 г. 2:27 пользователь "Valentin > Kulichenko" > > < > > > > > > > > > > > > [email protected]> написал: > > > > > > > > > > > > > > > > > > > > > > > > > Folks, > > > > > > > > > > > > > > > > > > > > > > > > > > Currently binary marshaller always encodes strings > in > > > > > UTF-8. > > > > > > > > > However, > > > > > > > > > > > > > sometimes it can be useful to customize this. For > > > > example, > > > > > if > > > > > > > > data > > > > > > > > > > > > contains > > > > > > > > > > > > > a lot of Cyrillic, Chinese or other symbols, but > not > > so > > > > > many > > > > > > > > Latin > > > > > > > > > > > > symbols, > > > > > > > > > > > > > memory is used very inefficiently. In this case it > > > would > > > > be > > > > > > > great > > > > > > > > > to > > > > > > > > > > > > encode > > > > > > > > > > > > > most frequently used symbols in one byte instead of > > two > > > > or > > > > > > > three. > > > > > > > > > > > > > > > > > > > > > > > > > > I propose to introduce BinaryStringEncoder > interface > > > that > > > > > > will > > > > > > > > > > convert > > > > > > > > > > > > > strings to byte arrays and back, and make it > > pluggable > > > > via > > > > > > > > > > > > > BinaryConfiguration. This will allow users to plug > in > > > any > > > > > > > > encoding > > > > > > > > > > > > > algorithms based on their requirements. > > > > > > > > > > > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-5655 > > > > > > > > > > > > > > > > > > > > > > > > > > -Val > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
