[
https://issues.apache.org/jira/browse/TIKA-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4539:
------------------------------
Description:
For tika-server and grpc and some other use cases, it would useful to have an
"initialization" config for parsers and other things and then an "update"
capability.
I hacked something out for the PDFParserConfig where we literally store the
method names for what was updated. This is really unpleasant.
If we're willing to use Lombok (had to install a plugin in Intellij, but it was
easy), we could create a ConfigBase like so:
{noformat}
public class ConfigBase {
static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
@JsonIgnore
private byte[] baseJson;
protected <T extends ConfigBase> T update(T basis, String update, Class<T>
clazz) throws IOException {
if (baseJson == null) {
baseJson = OBJECT_MAPPER.writeValueAsBytes(basis);
}
T base = OBJECT_MAPPER.readValue(baseJson, clazz);
return OBJECT_MAPPER
.readerForUpdating(base)
.readValue(update);
}
}{noformat}
Then for the PDFParserConfig. There are a lot of annotations, but we don't need
to create constructors or setters/getters. Obviously, the fields are for demo
purposes...
{noformat}
@EqualsAndHashCode(callSuper = true)
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class PDFParserConfig extends ConfigBase {
@Builder.Default
private String color = "blue";
@Builder.Default
private int length = -5;
@Builder.Default
private int width = 10;
public PDFParserConfig cloneAndUpdate(String json) throws IOException {
return update(this, json, PDFParserConfig.class);
}
} {noformat}
Then we can see results:
{noformat}
@Test
public void testOne() throws Exception {
System.out.println("default: " + new PDFParserConfig());
PDFParserConfig basis = new
PDFParserConfig.PDFParserConfigBuilder().color("white").build();
String json = """
{
"color":"green"
}
""";
System.out.println("BASIS before: " + basis);
System.out.println(basis.cloneAndUpdate(json));
System.out.println("BASIS after: " + basis);
} {noformat}
was:
For tika-server and grpc and some other use cases, it would useful to have an
"initialization" config for parsers and other things and then an "update"
capability.
I hacked something out for the PDFParserConfig where we literally store the
method names for what was updated. This is really unpleasant.
If we're willing to use Lombok (had to install a plugin in Intellij, but it was
easy), we could create a ConfigBase like so:
{noformat}
public class ConfigBase {
static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
@JsonIgnore
private byte[] baseJson;
protected <T extends ConfigBase> T update(T basis, String update, Class<T>
clazz) throws IOException {
if (baseJson == null) {
baseJson = OBJECT_MAPPER.writeValueAsBytes(basis);
}
T base = OBJECT_MAPPER.readValue(baseJson, clazz);
return OBJECT_MAPPER
.readerForUpdating(base)
.readValue(update);
}
}{noformat}
Then for the PDFParserConfig. There are a lot of annotations, but we don't need
to create constructors or setters/getters.
{noformat}
@EqualsAndHashCode(callSuper = true)
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class PDFParserConfig extends ConfigBase {
@Builder.Default
private String color = "blue";
@Builder.Default
private int length = -5;
@Builder.Default
private int width = 10;
public PDFParserConfig cloneAndUpdate(String json) throws IOException {
return update(this, json, PDFParserConfig.class);
}
} {noformat}
Then we can see results:
{noformat}
@Test
public void testOne() throws Exception {
System.out.println("default: " + new PDFParserConfig());
PDFParserConfig basis = new
PDFParserConfig.PDFParserConfigBuilder().color("white").build();
String json = """
{
"color":"green"
}
""";
System.out.println("BASIS before: " + basis);
System.out.println(basis.cloneAndUpdate(json));
System.out.println("BASIS after: " + basis);
} {noformat}
> Lombok and jackson for configs in 4.x?
> --------------------------------------
>
> Key: TIKA-4539
> URL: https://issues.apache.org/jira/browse/TIKA-4539
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Major
>
> For tika-server and grpc and some other use cases, it would useful to have an
> "initialization" config for parsers and other things and then an "update"
> capability.
>
> I hacked something out for the PDFParserConfig where we literally store the
> method names for what was updated. This is really unpleasant.
>
> If we're willing to use Lombok (had to install a plugin in Intellij, but it
> was easy), we could create a ConfigBase like so:
>
> {noformat}
> public class ConfigBase {
> static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
> @JsonIgnore
> private byte[] baseJson;
> protected <T extends ConfigBase> T update(T basis, String update,
> Class<T> clazz) throws IOException {
> if (baseJson == null) {
> baseJson = OBJECT_MAPPER.writeValueAsBytes(basis);
> }
> T base = OBJECT_MAPPER.readValue(baseJson, clazz);
> return OBJECT_MAPPER
> .readerForUpdating(base)
> .readValue(update);
> }
> }{noformat}
>
> Then for the PDFParserConfig. There are a lot of annotations, but we don't
> need to create constructors or setters/getters. Obviously, the fields are for
> demo purposes...
> {noformat}
> @EqualsAndHashCode(callSuper = true)
> @Data
> @Builder
> @AllArgsConstructor
> @NoArgsConstructor
> public class PDFParserConfig extends ConfigBase {
> @Builder.Default
> private String color = "blue";
> @Builder.Default
> private int length = -5;
> @Builder.Default
> private int width = 10;
> public PDFParserConfig cloneAndUpdate(String json) throws IOException {
> return update(this, json, PDFParserConfig.class);
> }
> } {noformat}
>
> Then we can see results:
> {noformat}
> @Test
> public void testOne() throws Exception {
> System.out.println("default: " + new PDFParserConfig());
> PDFParserConfig basis = new
> PDFParserConfig.PDFParserConfigBuilder().color("white").build();
> String json = """
> {
> "color":"green"
> }
> """;
> System.out.println("BASIS before: " + basis);
> System.out.println(basis.cloneAndUpdate(json));
> System.out.println("BASIS after: " + basis);
> } {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)