keith-turner commented on code in PR #3286:
URL: https://github.com/apache/accumulo/pull/3286#discussion_r1166222165


##########
core/src/main/java/org/apache/accumulo/core/metadata/schema/DataFileValue.java:
##########
@@ -20,42 +20,76 @@
 
 import static java.nio.charset.StandardCharsets.UTF_8;
 
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+import java.util.Optional;
+
+import org.apache.accumulo.core.data.Range;
 import org.apache.accumulo.core.data.Value;
 import org.apache.accumulo.core.iteratorsImpl.system.InterruptibleIterator;
 import org.apache.accumulo.core.iteratorsImpl.system.TimeSettingIterator;
+import org.apache.accumulo.core.util.json.RangeAdapter;
+
+import com.google.gson.Gson;
+import com.google.gson.JsonSyntaxException;
 
 public class DataFileValue {
-  private long size;
-  private long numEntries;
+
+  private static final Gson gson = RangeAdapter.createRangeGson();
+
+  private final long size;
+  private final long numEntries;
   private long time = -1;
+  private final List<Range> ranges;
 
-  public DataFileValue(long size, long numEntries, long time) {
-    this.size = size;
-    this.numEntries = numEntries;
-    this.time = time;
+  public DataFileValue(final long size, final long numEntries, final long 
time) {
+    this(size, numEntries, time, null);
+  }
+
+  public DataFileValue(final long size, final long numEntries) {
+    this(size, numEntries, null);
   }
 
-  public DataFileValue(long size, long numEntries) {
+  public DataFileValue(final long size, final long numEntries, final long time,
+      final Collection<Range> ranges) {
     this.size = size;
     this.numEntries = numEntries;
-    this.time = -1;
+    this.time = time;
+    // If ranges is null then just set to empty list and also merge 
overlapping to reduce
+    // the data stored if possible.
+    this.ranges = Optional.ofNullable(ranges).map(Range::mergeOverlapping)
+        .map(Collections::unmodifiableList).orElse(List.of());
   }
 
-  public DataFileValue(String encodedDFV) {
-    String[] ba = encodedDFV.split(",");
-
-    size = Long.parseLong(ba[0]);
-    numEntries = Long.parseLong(ba[1]);
+  public DataFileValue(long size, long numEntries, Collection<Range> ranges) {
+    this(size, numEntries, -1, ranges);
+  }
 
-    if (ba.length == 3) {
-      time = Long.parseLong(ba[2]);
-    } else {
-      time = -1;
+  public static DataFileValue decode(String encodedDFV) {
+    try {
+      // Optimistically try and decode from Json and fall back to decoding the 
legacy format
+      // if json parsing fails. Over time the old format will be replaced with 
the new format
+      // as new values are written or updated.
+      return gson.fromJson(encodedDFV, DataFileValue.class);
+    } catch (JsonSyntaxException e) {
+      return decodeLegacy(encodedDFV);

Review Comment:
   This could be a follow on issue/PR. If we convert all ranges in the upgrade 
code then we don't have to have the legacy fallback here, this will stick 
around in the code long after its actually needed.  If only have the conversion 
in the upgrade code it will make this code a bit cleaner.  We create Upgrader 
classes for going between each version and eventually those are dropped along 
with the legacy code.
   
   This is the metadata upgrade code 
https://github.com/apache/accumulo/blob/30d823def879234b4f390f7b816cef670d477f88/server/manager/src/main/java/org/apache/accumulo/manager/upgrade/Upgrader10to11.java#L112
 
   
   But not sure that is code for going from 2.0 to 2.1 or 2.1 to 3.x.  Not sure 
if we have created a new Upgrader class for going from 2.1
   
   



##########
core/src/main/java/org/apache/accumulo/core/metadata/schema/DataFileValue.java:
##########
@@ -20,42 +20,76 @@
 
 import static java.nio.charset.StandardCharsets.UTF_8;
 
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+import java.util.Optional;
+
+import org.apache.accumulo.core.data.Range;
 import org.apache.accumulo.core.data.Value;
 import org.apache.accumulo.core.iteratorsImpl.system.InterruptibleIterator;
 import org.apache.accumulo.core.iteratorsImpl.system.TimeSettingIterator;
+import org.apache.accumulo.core.util.json.RangeAdapter;
+
+import com.google.gson.Gson;
+import com.google.gson.JsonSyntaxException;
 
 public class DataFileValue {
-  private long size;
-  private long numEntries;
+
+  private static final Gson gson = RangeAdapter.createRangeGson();
+
+  private final long size;
+  private final long numEntries;
   private long time = -1;
+  private final List<Range> ranges;
 
-  public DataFileValue(long size, long numEntries, long time) {
-    this.size = size;
-    this.numEntries = numEntries;
-    this.time = time;
+  public DataFileValue(final long size, final long numEntries, final long 
time) {
+    this(size, numEntries, time, null);
+  }
+
+  public DataFileValue(final long size, final long numEntries) {
+    this(size, numEntries, null);
   }
 
-  public DataFileValue(long size, long numEntries) {
+  public DataFileValue(final long size, final long numEntries, final long time,
+      final Collection<Range> ranges) {
     this.size = size;
     this.numEntries = numEntries;
-    this.time = -1;
+    this.time = time;
+    // If ranges is null then just set to empty list and also merge 
overlapping to reduce
+    // the data stored if possible.
+    this.ranges = Optional.ofNullable(ranges).map(Range::mergeOverlapping)

Review Comment:
   Accumulo uses null a lot in the code and its not a great practice.  We could 
not accept null in this new code, if there are no ranges just pass in empty 
list.



##########
core/src/test/java/org/apache/accumulo/core/metadata/schema/DataFileValueTest.java:
##########
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.metadata.schema;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+import java.util.List;
+
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Range;
+import org.junit.jupiter.api.Test;
+
+public class DataFileValueTest {
+
+  @Test
+  public void testNoTimeOrRanges() {
+    final String json = new DataFileValue(555, 23).encodeAsString();
+    final DataFileValue dfv = DataFileValue.decode(json);
+
+    assertEquals(555, dfv.getSize());
+    assertEquals(23, dfv.getNumEntries());
+    assertEquals(-1, dfv.getTime());
+    assertEquals(0, dfv.getRanges().size());
+  }
+
+  @Test
+  public void testTimeNoRanges() {
+    long time = System.currentTimeMillis();
+    final String json = new DataFileValue(555, 23, time).encodeAsString();
+    final DataFileValue dfv = DataFileValue.decode(json);
+
+    assertEquals(555, dfv.getSize());
+    assertEquals(23, dfv.getNumEntries());
+    assertEquals(time, dfv.getTime());
+    assertEquals(0, dfv.getRanges().size());
+  }
+
+  @Test
+  public void testInfiniteRange() {
+    long time = System.currentTimeMillis();
+    final DataFileValue dfv = new DataFileValue(555, 23, time, List.of(new 
Range()));
+    assertEquals(1, dfv.getRanges().size());
+    assertEquals(new Range(), dfv.getRanges().get(0));
+
+    // Verify serialization
+    final String json = new DataFileValue(555, 23, time, List.of(new 
Range())).encodeAsString();
+    final DataFileValue decoded = DataFileValue.decode(json);
+
+    assertEquals(555, decoded.getSize());
+    assertEquals(23, decoded.getNumEntries());
+    assertEquals(time, decoded.getTime());
+    assertEquals(1, decoded.getRanges().size());
+    assertEquals(new Range(), decoded.getRanges().get(0));
+  }
+
+  @Test
+  public void testComplexRange() {
+    final Key key1 = new Key("row1", "cf1", "cq1", 50);
+    final Key key2 = new Key("row2", "cf2", "cq2", 100);
+    final String json = new DataFileValue(555, 23, List.of(new Range(key1, 
key2))).encodeAsString();
+    final DataFileValue dfv = DataFileValue.decode(json);
+
+    assertEquals(555, dfv.getSize());
+    assertEquals(23, dfv.getNumEntries());
+    assertEquals(-1, dfv.getTime());
+    assertEquals(1, dfv.getRanges().size());
+    assertEquals(new Range(key1, key2), dfv.getRanges().get(0));
+  }
+
+  @Test
+  public void testOverlappingRanges() {
+    long time = System.currentTimeMillis();
+    final DataFileValue dfv =
+        new DataFileValue(555, 23, time, List.of(new Range("abc", "xyz"), new 
Range("def")));
+    // Second range is part of first range so should collapse into 1
+    assertEquals(1, dfv.getRanges().size());
+    assertEquals(new Range("abc", "xyz"), dfv.getRanges().get(0));
+  }
+
+  @Test
+  public void testMultipleRanges() {
+    long time = System.currentTimeMillis();
+    final DataFileValue dfv =
+        new DataFileValue(555, 23, time, List.of(new Range("abc", "xyz"), new 
Range("123", "789")));
+
+    // Ranges will be sorted
+    assertEquals(2, dfv.getRanges().size());
+    assertEquals(new Range("123", "789"), dfv.getRanges().get(0));
+    assertEquals(new Range("abc", "xyz"), dfv.getRanges().get(1));

Review Comment:
   Metadata data ranges are exclusive,inclusive.  Like [this 
code](https://github.com/apache/accumulo/blob/30d823def879234b4f390f7b816cef670d477f88/core/src/main/java/org/apache/accumulo/core/dataImpl/KeyExtent.java#L372)
 converts creates a data range for a tablet. So should test this case, could do 
the following.
   
   
   ```suggestion
           new DataFileValue(555, 23, time, List.of(new Range("abc", false, 
"xyz", true), new Range("123", "789")));
   
       // Ranges will be sorted
       assertEquals(2, dfv.getRanges().size());
       assertEquals(new Range("123", "789"), dfv.getRanges().get(0));
       assertEquals(new Range("abc",false, "xyz",true), dfv.getRanges().get(1));
   ```



##########
core/src/test/java/org/apache/accumulo/core/metadata/schema/DataFileValueTest.java:
##########
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.metadata.schema;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+import java.util.List;
+
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Range;
+import org.junit.jupiter.api.Test;
+
+public class DataFileValueTest {
+
+  @Test
+  public void testNoTimeOrRanges() {
+    final String json = new DataFileValue(555, 23).encodeAsString();
+    final DataFileValue dfv = DataFileValue.decode(json);
+
+    assertEquals(555, dfv.getSize());
+    assertEquals(23, dfv.getNumEntries());
+    assertEquals(-1, dfv.getTime());
+    assertEquals(0, dfv.getRanges().size());
+  }
+
+  @Test
+  public void testTimeNoRanges() {
+    long time = System.currentTimeMillis();
+    final String json = new DataFileValue(555, 23, time).encodeAsString();
+    final DataFileValue dfv = DataFileValue.decode(json);
+
+    assertEquals(555, dfv.getSize());
+    assertEquals(23, dfv.getNumEntries());
+    assertEquals(time, dfv.getTime());
+    assertEquals(0, dfv.getRanges().size());
+  }
+
+  @Test
+  public void testInfiniteRange() {
+    long time = System.currentTimeMillis();
+    final DataFileValue dfv = new DataFileValue(555, 23, time, List.of(new 
Range()));
+    assertEquals(1, dfv.getRanges().size());
+    assertEquals(new Range(), dfv.getRanges().get(0));
+
+    // Verify serialization
+    final String json = new DataFileValue(555, 23, time, List.of(new 
Range())).encodeAsString();
+    final DataFileValue decoded = DataFileValue.decode(json);
+
+    assertEquals(555, decoded.getSize());
+    assertEquals(23, decoded.getNumEntries());
+    assertEquals(time, decoded.getTime());
+    assertEquals(1, decoded.getRanges().size());
+    assertEquals(new Range(), decoded.getRanges().get(0));
+  }
+
+  @Test
+  public void testComplexRange() {
+    final Key key1 = new Key("row1", "cf1", "cq1", 50);
+    final Key key2 = new Key("row2", "cf2", "cq2", 100);
+    final String json = new DataFileValue(555, 23, List.of(new Range(key1, 
key2))).encodeAsString();
+    final DataFileValue dfv = DataFileValue.decode(json);
+
+    assertEquals(555, dfv.getSize());
+    assertEquals(23, dfv.getNumEntries());
+    assertEquals(-1, dfv.getTime());
+    assertEquals(1, dfv.getRanges().size());
+    assertEquals(new Range(key1, key2), dfv.getRanges().get(0));
+  }
+
+  @Test
+  public void testOverlappingRanges() {
+    long time = System.currentTimeMillis();
+    final DataFileValue dfv =
+        new DataFileValue(555, 23, time, List.of(new Range("abc", "xyz"), new 
Range("def")));
+    // Second range is part of first range so should collapse into 1
+    assertEquals(1, dfv.getRanges().size());
+    assertEquals(new Range("abc", "xyz"), dfv.getRanges().get(0));
+  }
+
+  @Test
+  public void testMultipleRanges() {
+    long time = System.currentTimeMillis();
+    final DataFileValue dfv =
+        new DataFileValue(555, 23, time, List.of(new Range("abc", "xyz"), new 
Range("123", "789")));
+
+    // Ranges will be sorted
+    assertEquals(2, dfv.getRanges().size());
+    assertEquals(new Range("123", "789"), dfv.getRanges().get(0));
+    assertEquals(new Range("abc", "xyz"), dfv.getRanges().get(1));
+
+    // Test encoding
+    final String json =
+        new DataFileValue(700, 25, time, List.of(new Range("abc", "xyz"), new 
Range("123", "789")))
+            .encodeAsString();
+    final DataFileValue decoded = DataFileValue.decode(json);
+
+    assertEquals(700, decoded.getSize());
+    assertEquals(25, decoded.getNumEntries());
+    assertEquals(time, decoded.getTime());
+    assertEquals(2, decoded.getRanges().size());
+    assertEquals(2, decoded.getRanges().size());

Review Comment:
   Seems like a duplicate line
   
   ```suggestion
       assertEquals(2, decoded.getRanges().size());
   ```



##########
core/src/main/java/org/apache/accumulo/core/metadata/schema/DataFileValue.java:
##########
@@ -20,42 +20,76 @@
 
 import static java.nio.charset.StandardCharsets.UTF_8;
 
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+import java.util.Optional;
+
+import org.apache.accumulo.core.data.Range;
 import org.apache.accumulo.core.data.Value;
 import org.apache.accumulo.core.iteratorsImpl.system.InterruptibleIterator;
 import org.apache.accumulo.core.iteratorsImpl.system.TimeSettingIterator;
+import org.apache.accumulo.core.util.json.RangeAdapter;
+
+import com.google.gson.Gson;
+import com.google.gson.JsonSyntaxException;
 
 public class DataFileValue {
-  private long size;
-  private long numEntries;
+
+  private static final Gson gson = RangeAdapter.createRangeGson();
+
+  private final long size;
+  private final long numEntries;
   private long time = -1;
+  private final List<Range> ranges;

Review Comment:
   We need row ranges, but the Range class if for Key ranges so it includes 
row,col,vis,timestamp, etc.  So range include more than we need, wonder how 
this impact serialization.  Wondering it results in extra unnecessary data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to