[jira] [Created] (JCR-3075) incorrect HTML excerpt generation for queries on japanese text content

Julian Reschke (JIRA) Fri, 16 Sep 2011 07:22:33 -0700

incorrect HTML excerpt generation for queries on japanese text content 
-----------------------------------------------------------------------


                 Key: JCR-3075
                 URL: https://issues.apache.org/jira/browse/JCR-3075
             Project: Jackrabbit Content Repository
          Issue Type: Bug
          Components: jackrabbit-core
            Reporter: Julian Reschke
            Priority: Minor


The generated excerpt highlights single characters instead of full words. Test 
case (to be added to FullTextQueryTest):

     public void testJapaneseAndHighlight() throws RepositoryException {
        // 
http://translate.google.com/#auto|en|%E3%82%B3%E3%83%B3%E3%83%86%E3%83%B3%E3%83%88
        String jContent = "\u30b3\u30fe\u30c6\u30f3\u30c8";
        // http://translate.google.com/#auto|en|%E3%83%86%E3%82%B9%E3%83%88
        String jTest = "\u30c6\u30b9\u30c8";
        
        String content = "some text with japanese: " + jContent
                + " ('content')" + " and " + jTest + " ('test').";

        // expected excerpt; note this may change if excerpt providers change
        String expectedExcerpt = "<div><span>some text with japanese: " + 
jContent
                + " ('content') and <strong>" + jTest
                + "</strong> ('test').</span></div>";
        
        Node n = testRootNode.addNode("node1");
        n.setProperty("title", content);
        testRootNode.getSession().save();
        
        String xpath = "/jcr:root" + testRoot + "/element(*, nt:unstructured)"
                + "[jcr:contains(., '" + jTest + "')]/rep:excerpt(.)";
        Query q = superuser.getWorkspace().getQueryManager()
                .createQuery(xpath, Query.XPATH);
        
        QueryResult qr = q.execute();
        RowIterator it = qr.getRows();
        int cnt = 0;
        while (it.hasNext()) {
            cnt++;
            Row found = it.nextRow();
            assertEquals(n.getPath(), found.getPath());
            String excerpt = found.getValue("rep:excerpt(.)").getString();
            assertEquals(expectedExcerpt, excerpt);
        }
        
        assertEquals(1, cnt);
    }


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (JCR-3075) incorrect HTML excerpt generation for queries on japanese text content

Reply via email to