Jackie-Jiang commented on code in PR #16094:
URL: https://github.com/apache/pinot/pull/16094#discussion_r2198331309
##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java:
##########
@@ -734,16 +734,19 @@ private static void addFieldToPinotSchema(Schema
pinotSchema, DataType dataType,
public static List<Map<String, String>> flatten(String jsonString,
JsonIndexConfig jsonIndexConfig)
throws IOException {
JsonNode jsonNode;
+ List<Map<String, String>> flattenedJson;
try {
jsonNode = JsonUtils.stringToJsonNode(jsonString);
- } catch (JsonProcessingException e) {
+ flattenedJson = JsonUtils.flatten(jsonNode, jsonIndexConfig);
Review Comment:
(nit) We can directly `return`
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/inv/text/LuceneFSTIndexCreator.java:
##########
@@ -50,36 +56,56 @@ public class LuceneFSTIndexCreator implements
FSTIndexCreator {
*
* @param indexDir Index directory
* @param columnName Column name for which index is being created
+ * @param tableNameWithType table name with type
* @param sortedEntries Sorted entries of the unique values of the column.
* @throws IOException
*/
- public LuceneFSTIndexCreator(File indexDir, String columnName, String[]
sortedEntries)
+ public LuceneFSTIndexCreator(File indexDir, String columnName, String
tableNameWithType, String[] sortedEntries)
throws IOException {
+ this(indexDir, columnName, tableNameWithType, sortedEntries, new
FSTBuilder());
+ }
+
+ @VisibleForTesting
+ public LuceneFSTIndexCreator(File indexDir, String columnName, String
tableNameWithType, String[] sortedEntries,
+ FSTBuilder fstBuilder)
+ throws IOException {
+ _tableNameWithType = tableNameWithType;
_fstIndexFile = new File(indexDir, columnName +
V1Constants.Indexes.LUCENE_V912_FST_INDEX_FILE_EXTENSION);
- _fstBuilder = new FSTBuilder();
+ _fstBuilder = fstBuilder;
_dictId = 0;
if (sortedEntries != null) {
for (_dictId = 0; _dictId < sortedEntries.length; _dictId++) {
- _fstBuilder.addEntry(sortedEntries[_dictId], _dictId);
+ try {
+ _fstBuilder.addEntry(sortedEntries[_dictId], _dictId);
+ } catch (IOException ex) {
Review Comment:
Is `IOException` enough? Should we always catch `Exception`?
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/inv/json/BaseJsonIndexCreator.java:
##########
@@ -94,6 +101,27 @@ public void add(String jsonString)
addFlattenedRecords(JsonUtils.flatten(jsonString, _jsonIndexConfig));
}
+ @Override
+ public void add(Object value)
+ throws IOException {
+ String valueToAdd;
+ try {
+ valueToAdd = JsonUtils.objectToString(value);
+ } catch (JsonProcessingException e) {
+ if (_jsonIndexConfig.getSkipInvalidJson()) {
+ // Caught exception while trying to add, update metric and add a
default SKIPPED_FLATTENED_RECORD
+ String metricKeyName =
+ _tableNameWithType + "-" +
JsonIndexType.INDEX_DISPLAY_NAME.toUpperCase(Locale.US) + "-indexingError";
+ ServerMetrics.get().addMeteredTableValue(metricKeyName,
ServerMeter.INDEXING_FAILURES, 1);
+ addFlattenedRecords(JsonUtils.SKIPPED_FLATTENED_RECORD);
+ return;
Review Comment:
Agree we can make it a separate PR
##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java:
##########
@@ -734,16 +734,19 @@ private static void addFieldToPinotSchema(Schema
pinotSchema, DataType dataType,
public static List<Map<String, String>> flatten(String jsonString,
JsonIndexConfig jsonIndexConfig)
throws IOException {
JsonNode jsonNode;
+ List<Map<String, String>> flattenedJson;
try {
jsonNode = JsonUtils.stringToJsonNode(jsonString);
- } catch (JsonProcessingException e) {
+ flattenedJson = JsonUtils.flatten(jsonNode, jsonIndexConfig);
+ } catch (JsonProcessingException | IllegalArgumentException e) {
Review Comment:
Should we just catch `Exception`?
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/inv/text/LuceneFSTIndexCreator.java:
##########
@@ -50,36 +56,56 @@ public class LuceneFSTIndexCreator implements
FSTIndexCreator {
*
* @param indexDir Index directory
* @param columnName Column name for which index is being created
+ * @param tableNameWithType table name with type
* @param sortedEntries Sorted entries of the unique values of the column.
* @throws IOException
*/
- public LuceneFSTIndexCreator(File indexDir, String columnName, String[]
sortedEntries)
+ public LuceneFSTIndexCreator(File indexDir, String columnName, String
tableNameWithType, String[] sortedEntries)
throws IOException {
+ this(indexDir, columnName, tableNameWithType, sortedEntries, new
FSTBuilder());
+ }
+
+ @VisibleForTesting
+ public LuceneFSTIndexCreator(File indexDir, String columnName, String
tableNameWithType, String[] sortedEntries,
+ FSTBuilder fstBuilder)
+ throws IOException {
+ _tableNameWithType = tableNameWithType;
_fstIndexFile = new File(indexDir, columnName +
V1Constants.Indexes.LUCENE_V912_FST_INDEX_FILE_EXTENSION);
- _fstBuilder = new FSTBuilder();
+ _fstBuilder = fstBuilder;
_dictId = 0;
if (sortedEntries != null) {
for (_dictId = 0; _dictId < sortedEntries.length; _dictId++) {
- _fstBuilder.addEntry(sortedEntries[_dictId], _dictId);
+ try {
+ _fstBuilder.addEntry(sortedEntries[_dictId], _dictId);
+ } catch (IOException ex) {
+ // Caught exception while trying to add, update metric and skip the
document
+ String metricKeyName =
+ _tableNameWithType + "-" +
FstIndexType.INDEX_DISPLAY_NAME.toUpperCase(Locale.US) + "-indexingError";
+ ServerMetrics.get().addMeteredTableValue(metricKeyName,
ServerMeter.INDEXING_FAILURES, 1);
Review Comment:
Continue on error should be a property for all index creators. Can you take
a stab to see if you can add it? Can be a separate PR
##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/creator/JsonIndexCreator.java:
##########
@@ -52,6 +51,12 @@ default void add(Object[] values, @Nullable int[] dictIds) {
void add(String jsonString)
throws IOException;
+ /**
+ * Adds the next json value for Map type
+ */
+ void add(Object jsonMap)
Review Comment:
Should we make it:
```suggestion
void add(Map jsonMap)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]