[ 
https://issues.apache.org/jira/browse/TIKA-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-4539:
------------------------------
    Description: 
For tika-server and grpc and some other use cases, it would useful to have an 
"initialization" config for parsers and other things and then an "update" 
capability.

 

I hacked something out for the PDFParserConfig back in Tika 1.x where we 
literally store the method names for what was updated. This is really 
unpleasant.

 

If we're willing to use Lombok (had to install a plugin in Intellij, but it was 
easy), we could create a ConfigBase like so:

 
{noformat}
 public class ConfigBase {

    static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();

    @JsonIgnore
    private byte[] baseJson;

    protected <T extends ConfigBase> T update(T basis, String update, Class<T> 
clazz) throws IOException {
        if (baseJson == null) {
            baseJson = OBJECT_MAPPER.writeValueAsBytes(basis);
        }
        T base = OBJECT_MAPPER.readValue(baseJson, clazz);
        return OBJECT_MAPPER
                .readerForUpdating(base)
                .readValue(update);
    }
}{noformat}
 

Then for the PDFParserConfig. There are a lot of annotations, but we don't need 
to create constructors or setters/getters. Obviously, the fields are for demo 
purposes...
{noformat}
@EqualsAndHashCode(callSuper = true)
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class PDFParserConfig extends ConfigBase {

    @Builder.Default
    private String color = "blue";
    @Builder.Default
    private int length = -5;
    @Builder.Default
    private int width = 10;

    public PDFParserConfig cloneAndUpdate(String json) throws IOException {
        return update(this, json, PDFParserConfig.class);
    }
} {noformat}
 

Then we can see results:
{noformat}
@Test
public void testOne() throws Exception {
    System.out.println("default: " + new PDFParserConfig());
    PDFParserConfig basis = new 
PDFParserConfig.PDFParserConfigBuilder().color("white").build();

    String json = """
                {
                    "color":"green"
                }
            """;
    System.out.println("BASIS before: " + basis);
    System.out.println(basis.cloneAndUpdate(json));
    System.out.println("BASIS after: " + basis);
} {noformat}
{noformat}
default: PDFParserConfig(color=blue, length=-5, width=10)
BASIS before: PDFParserConfig(color=white, length=-5, width=10)
cloned and updated: PDFParserConfig(color=green, length=-5, width=10)
BASIS after: PDFParserConfig(color=white, length=-5, width=10) {noformat}

  was:
For tika-server and grpc and some other use cases, it would useful to have an 
"initialization" config for parsers and other things and then an "update" 
capability.

 

I hacked something out for the PDFParserConfig where we literally store the 
method names for what was updated. This is really unpleasant.

 

If we're willing to use Lombok (had to install a plugin in Intellij, but it was 
easy), we could create a ConfigBase like so:

 
{noformat}
 public class ConfigBase {

    static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();

    @JsonIgnore
    private byte[] baseJson;

    protected <T extends ConfigBase> T update(T basis, String update, Class<T> 
clazz) throws IOException {
        if (baseJson == null) {
            baseJson = OBJECT_MAPPER.writeValueAsBytes(basis);
        }
        T base = OBJECT_MAPPER.readValue(baseJson, clazz);
        return OBJECT_MAPPER
                .readerForUpdating(base)
                .readValue(update);
    }
}{noformat}
 

Then for the PDFParserConfig. There are a lot of annotations, but we don't need 
to create constructors or setters/getters. Obviously, the fields are for demo 
purposes...
{noformat}
@EqualsAndHashCode(callSuper = true)
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class PDFParserConfig extends ConfigBase {

    @Builder.Default
    private String color = "blue";
    @Builder.Default
    private int length = -5;
    @Builder.Default
    private int width = 10;

    public PDFParserConfig cloneAndUpdate(String json) throws IOException {
        return update(this, json, PDFParserConfig.class);
    }
} {noformat}
 

Then we can see results:
{noformat}
@Test
public void testOne() throws Exception {
    System.out.println("default: " + new PDFParserConfig());
    PDFParserConfig basis = new 
PDFParserConfig.PDFParserConfigBuilder().color("white").build();

    String json = """
                {
                    "color":"green"
                }
            """;
    System.out.println("BASIS before: " + basis);
    System.out.println(basis.cloneAndUpdate(json));
    System.out.println("BASIS after: " + basis);
} {noformat}
{noformat}
default: PDFParserConfig(color=blue, length=-5, width=10)
BASIS before: PDFParserConfig(color=white, length=-5, width=10)
cloned and updated: PDFParserConfig(color=green, length=-5, width=10)
BASIS after: PDFParserConfig(color=white, length=-5, width=10) {noformat}


> Lombok and jackson for configs in 4.x?
> --------------------------------------
>
>                 Key: TIKA-4539
>                 URL: https://issues.apache.org/jira/browse/TIKA-4539
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Major
>
> For tika-server and grpc and some other use cases, it would useful to have an 
> "initialization" config for parsers and other things and then an "update" 
> capability.
>  
> I hacked something out for the PDFParserConfig back in Tika 1.x where we 
> literally store the method names for what was updated. This is really 
> unpleasant.
>  
> If we're willing to use Lombok (had to install a plugin in Intellij, but it 
> was easy), we could create a ConfigBase like so:
>  
> {noformat}
>  public class ConfigBase {
>     static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
>     @JsonIgnore
>     private byte[] baseJson;
>     protected <T extends ConfigBase> T update(T basis, String update, 
> Class<T> clazz) throws IOException {
>         if (baseJson == null) {
>             baseJson = OBJECT_MAPPER.writeValueAsBytes(basis);
>         }
>         T base = OBJECT_MAPPER.readValue(baseJson, clazz);
>         return OBJECT_MAPPER
>                 .readerForUpdating(base)
>                 .readValue(update);
>     }
> }{noformat}
>  
> Then for the PDFParserConfig. There are a lot of annotations, but we don't 
> need to create constructors or setters/getters. Obviously, the fields are for 
> demo purposes...
> {noformat}
> @EqualsAndHashCode(callSuper = true)
> @Data
> @Builder
> @AllArgsConstructor
> @NoArgsConstructor
> public class PDFParserConfig extends ConfigBase {
>     @Builder.Default
>     private String color = "blue";
>     @Builder.Default
>     private int length = -5;
>     @Builder.Default
>     private int width = 10;
>     public PDFParserConfig cloneAndUpdate(String json) throws IOException {
>         return update(this, json, PDFParserConfig.class);
>     }
> } {noformat}
>  
> Then we can see results:
> {noformat}
> @Test
> public void testOne() throws Exception {
>     System.out.println("default: " + new PDFParserConfig());
>     PDFParserConfig basis = new 
> PDFParserConfig.PDFParserConfigBuilder().color("white").build();
>     String json = """
>                 {
>                     "color":"green"
>                 }
>             """;
>     System.out.println("BASIS before: " + basis);
>     System.out.println(basis.cloneAndUpdate(json));
>     System.out.println("BASIS after: " + basis);
> } {noformat}
> {noformat}
> default: PDFParserConfig(color=blue, length=-5, width=10)
> BASIS before: PDFParserConfig(color=white, length=-5, width=10)
> cloned and updated: PDFParserConfig(color=green, length=-5, width=10)
> BASIS after: PDFParserConfig(color=white, length=-5, width=10) {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to