[
https://issues.apache.org/jira/browse/TIKA-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4539:
------------------------------
Description:
For tika-server and grpc and some other use cases, it would useful to have an
"initialization" config for parsers and other things and then an "update"
capability.
I hacked something out for the PDFParserConfig back in Tika 1.x where we
literally store the method names for what was updated. This is really
unpleasant.
If we're willing to use Lombok (had to install a plugin in Intellij, but it was
easy), we could create a ConfigBase like so:
{noformat}
public class ConfigBase {
static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
@JsonIgnore
private byte[] baseJson;
protected <T extends ConfigBase> T update(T basis, String update, Class<T>
clazz) throws IOException {
if (baseJson == null) {
baseJson = OBJECT_MAPPER.writeValueAsBytes(basis);
}
T base = OBJECT_MAPPER.readValue(baseJson, clazz);
return OBJECT_MAPPER
.readerForUpdating(base)
.readValue(update);
}
}{noformat}
Then for the PDFParserConfig. There are a lot of annotations, but we don't need
to create constructors or setters/getters. Obviously, the fields are for demo
purposes...
{noformat}
@EqualsAndHashCode(callSuper = true)
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class PDFParserConfig extends ConfigBase {
@Builder.Default
private String color = "blue";
@Builder.Default
private int length = -5;
@Builder.Default
private int width = 10;
public PDFParserConfig cloneAndUpdate(String json) throws IOException {
return update(this, json, PDFParserConfig.class);
}
} {noformat}
Then we can see results:
{noformat}
@Test
public void testOne() throws Exception {
System.out.println("default: " + new PDFParserConfig());
PDFParserConfig basis = new
PDFParserConfig.PDFParserConfigBuilder().color("white").build();
String json = """
{
"color":"green"
}
""";
System.out.println("BASIS before: " + basis);
System.out.println(basis.cloneAndUpdate(json));
System.out.println("BASIS after: " + basis);
} {noformat}
{noformat}
default: PDFParserConfig(color=blue, length=-5, width=10)
BASIS before: PDFParserConfig(color=white, length=-5, width=10)
cloned and updated: PDFParserConfig(color=green, length=-5, width=10)
BASIS after: PDFParserConfig(color=white, length=-5, width=10) {noformat}
was:
For tika-server and grpc and some other use cases, it would useful to have an
"initialization" config for parsers and other things and then an "update"
capability.
I hacked something out for the PDFParserConfig where we literally store the
method names for what was updated. This is really unpleasant.
If we're willing to use Lombok (had to install a plugin in Intellij, but it was
easy), we could create a ConfigBase like so:
{noformat}
public class ConfigBase {
static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
@JsonIgnore
private byte[] baseJson;
protected <T extends ConfigBase> T update(T basis, String update, Class<T>
clazz) throws IOException {
if (baseJson == null) {
baseJson = OBJECT_MAPPER.writeValueAsBytes(basis);
}
T base = OBJECT_MAPPER.readValue(baseJson, clazz);
return OBJECT_MAPPER
.readerForUpdating(base)
.readValue(update);
}
}{noformat}
Then for the PDFParserConfig. There are a lot of annotations, but we don't need
to create constructors or setters/getters. Obviously, the fields are for demo
purposes...
{noformat}
@EqualsAndHashCode(callSuper = true)
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class PDFParserConfig extends ConfigBase {
@Builder.Default
private String color = "blue";
@Builder.Default
private int length = -5;
@Builder.Default
private int width = 10;
public PDFParserConfig cloneAndUpdate(String json) throws IOException {
return update(this, json, PDFParserConfig.class);
}
} {noformat}
Then we can see results:
{noformat}
@Test
public void testOne() throws Exception {
System.out.println("default: " + new PDFParserConfig());
PDFParserConfig basis = new
PDFParserConfig.PDFParserConfigBuilder().color("white").build();
String json = """
{
"color":"green"
}
""";
System.out.println("BASIS before: " + basis);
System.out.println(basis.cloneAndUpdate(json));
System.out.println("BASIS after: " + basis);
} {noformat}
{noformat}
default: PDFParserConfig(color=blue, length=-5, width=10)
BASIS before: PDFParserConfig(color=white, length=-5, width=10)
cloned and updated: PDFParserConfig(color=green, length=-5, width=10)
BASIS after: PDFParserConfig(color=white, length=-5, width=10) {noformat}
> Lombok and jackson for configs in 4.x?
> --------------------------------------
>
> Key: TIKA-4539
> URL: https://issues.apache.org/jira/browse/TIKA-4539
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Major
>
> For tika-server and grpc and some other use cases, it would useful to have an
> "initialization" config for parsers and other things and then an "update"
> capability.
>
> I hacked something out for the PDFParserConfig back in Tika 1.x where we
> literally store the method names for what was updated. This is really
> unpleasant.
>
> If we're willing to use Lombok (had to install a plugin in Intellij, but it
> was easy), we could create a ConfigBase like so:
>
> {noformat}
> public class ConfigBase {
> static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
> @JsonIgnore
> private byte[] baseJson;
> protected <T extends ConfigBase> T update(T basis, String update,
> Class<T> clazz) throws IOException {
> if (baseJson == null) {
> baseJson = OBJECT_MAPPER.writeValueAsBytes(basis);
> }
> T base = OBJECT_MAPPER.readValue(baseJson, clazz);
> return OBJECT_MAPPER
> .readerForUpdating(base)
> .readValue(update);
> }
> }{noformat}
>
> Then for the PDFParserConfig. There are a lot of annotations, but we don't
> need to create constructors or setters/getters. Obviously, the fields are for
> demo purposes...
> {noformat}
> @EqualsAndHashCode(callSuper = true)
> @Data
> @Builder
> @AllArgsConstructor
> @NoArgsConstructor
> public class PDFParserConfig extends ConfigBase {
> @Builder.Default
> private String color = "blue";
> @Builder.Default
> private int length = -5;
> @Builder.Default
> private int width = 10;
> public PDFParserConfig cloneAndUpdate(String json) throws IOException {
> return update(this, json, PDFParserConfig.class);
> }
> } {noformat}
>
> Then we can see results:
> {noformat}
> @Test
> public void testOne() throws Exception {
> System.out.println("default: " + new PDFParserConfig());
> PDFParserConfig basis = new
> PDFParserConfig.PDFParserConfigBuilder().color("white").build();
> String json = """
> {
> "color":"green"
> }
> """;
> System.out.println("BASIS before: " + basis);
> System.out.println(basis.cloneAndUpdate(json));
> System.out.println("BASIS after: " + basis);
> } {noformat}
> {noformat}
> default: PDFParserConfig(color=blue, length=-5, width=10)
> BASIS before: PDFParserConfig(color=white, length=-5, width=10)
> cloned and updated: PDFParserConfig(color=green, length=-5, width=10)
> BASIS after: PDFParserConfig(color=white, length=-5, width=10) {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)