[GitHub] orc pull request #304: ORC-397. Allow selective disabling of dictionary enco...

wgtmac Tue, 28 Aug 2018 21:39:54 -0700

Github user wgtmac commented on a diff in the pull request:

    https://github.com/apache/orc/pull/304#discussion_r213542675
  
    --- Diff: java/core/src/test/org/apache/orc/TestStringDictionary.java ---
    @@ -409,4 +411,77 @@ public void testTooManyDistinctV11AlwaysDictionary() 
throws Exception {
     
       }
     
    +  /**
    +   * Test that dictionaries can be disabled, per column. In this test, we 
want to disable DICTIONARY_V2 for the
    +   * `longString` column (presumably for a low hit-ratio), while 
preserving DICTIONARY_V2 for `shortString`.
    +   * @throws Exception on unexpected failure
    +   */
    +  @Test
    +  public void testDisableDictionaryForSpecificColumn() throws Exception {
    +    final String SHORT_STRING_VALUE = "foo";
    +    final String  LONG_STRING_VALUE = "BAAAAAAAAR!!";
    +
    +    TypeDescription schema =
    +        
TypeDescription.fromString("struct<shortString:string,longString:string>");
    +
    +    Writer writer = OrcFile.createWriter(
    +        testFilePath,
    +        OrcFile.writerOptions(conf).setSchema(schema)
    +            .compress(CompressionKind.NONE)
    +            .bufferSize(10000)
    +            .directEncodingColumns("longString"));
    --- End diff --
    
    Is it better to support specifying columns which use dictionary encoding?

---

[GitHub] orc pull request #304: ORC-397. Allow selective disabling of dictionary enco...

Reply via email to