[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175913#comment-16175913
 ] 

ASF GitHub Bot commented on TIKA-2400:
--------------------------------------

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139325811
 
 

 ##########
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##########
 @@ -140,29 +133,17 @@ public synchronized void parse(InputStream stream, 
ContentHandler handler, Metad
             for (RecognisedObject object : objects) {
                 if (object instanceof CaptionObject) {
                     if (xhtmlStartVal == null) xhtmlStartVal = "captions";
-                    LOG.debug("Add {}", object);
-                    String mdValue = String.format(Locale.ENGLISH, "%s (%.5f)",
-                            object.getLabel(), object.getConfidence());
-                    metadata.add(MD_KEY_IMG_CAP, mdValue);
-                    acceptedObjects.add(object);
+                    String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
+                    metadata.add(MD_KEY_IMG_CAP, mdVal);
                     xhtmlIds.add(String.valueOf(count++));
                 } else {
                     if (xhtmlStartVal == null) xhtmlStartVal = "objects";
-                    if (object.getConfidence() >= minConfidence) {
-                        count++;
-                        LOG.info("Add {}", object);
-                        String mdValue = String.format(Locale.ENGLISH, "%s 
(%.5f)",
-                                object.getLabel(), object.getConfidence());
-                        metadata.add(MD_KEY_OBJ_REC, mdValue);
-                        acceptedObjects.add(object);
-                        xhtmlIds.add(object.getId());
-                        if (count >= topN) {
-                            break;
-                        }
-                    } else {
-                        LOG.warn("Object {} confidence {} less than min {}", 
object, object.getConfidence(), minConfidence);
-                    }
+                    String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
+                    metadata.add(MD_KEY_OBJ_REC, mdVal);
+                    xhtmlIds.add(object.getId());
                 }
+                LOG.info("Add {}", object);
 
 Review comment:
   - [ ] Thanks for following good logging practice of using `{}`. will be 
great if you can remove String concatenation from 
[`RecognisedObject.toString`](https://github.com/ThejanW/tika/blob/92c65e0a43e7f09a0566bec34f352314dffe5def/tika-parsers/src/main/java/org/apache/tika/parser/recognition/RecognisedObject.java#L84-L90)
 to use `StringBuffer` or `String format`. You can do it through IDE with few 
clicks. Thanks in advance for cleanup
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -----------------------------------------------------
>
>                 Key: TIKA-2400
>                 URL: https://issues.apache.org/jira/browse/TIKA-2400
>             Project: Tika
>          Issue Type: Sub-task
>          Components: parser
>            Reporter: Thejan Wijesinghe
>            Priority: Minor
>             Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to