Hi Riyafa,

On 27 Mar 2016, at 23:50, Riyafa Abdul Hameed wrote:

I modified the pull request so that I was able to fix most of the previous
errors bringing up kind of different errors:

Consider the function fn:tokenize. According to the defininitiion [1] it
should return a sequence of strings
*Definition:*

fn:tokenize($input as xs:string?, $pattern as xs:string) as xs:string*
fn:tokenize( $input  as xs:string?, $pattern  as xs:string, $flags  as
xs:string) as xs:string*

where xs: string* represents a sequence of strings. Earlier having not read this definition properly I returned single string with each tokenized part separated by a single space--all the results did not pass because the tests
were expecting a sequence.

Now I have modified the code to return a sequence of string for the above function, but now most of the tests fail. I have found the reason for this
by remote debugging the org.apache.vxquery.result.ResultUtils class:

The result tested by test suite requires the final result to be a sequence of strings separated by a single white space, but the string generated in the ResultUtils class has a sequence of strings separated by the new line
(\n) character because of which tests on tokenize fail.

for an example consider the test (test  14015 in [2]):

fn:tokenize("The cat sat on the mat", "\s+")

The expected result is:
The cat sat on the mat

But the string generated (at ResultUtils) and printed on the console

The
cat
sat
on
the
mat

each string separated by a new line character.

For the same reason other tests also fail (Eg: 13950 in [2]).

Shall I create an issue in jira so that a fix could be made by which
instead of using a new line character to separate the values in a sequence
when printing to the console a single whitespace would be used?

Yes, you are right, this is indeed serialization problem (i.e. a problem of
how we serialize instances of the XQuery Data Model) and it should be
captured in a JIRA. However, I am not sure that we always want to move a
single space. I think that the issue should state that we need to find out where the new line is introduced and that we need to discuss/decide in which
cases we want a single-space and in which one a new line is preferable.

Also I am not sure I understand your instruction on how to make a PR for the change in the materialized results. What do you mean by "checking that only
the order has changed"? (sorry if it is a silly question)

On the website we have instructions how to generate the XQTS results.
However, the results that are currently checked in are not in the order that they would be in if you follow the instructions. The instructions tell us to sort the result using "sort", while the checked in results were sorted using
"sort -V". While using "sort -V" creates a result that’s more sensibly
sorted, the "-V" option is not available in "sort" on all platforms (e.g. it is not available on OS X). So I think that we should move back to sorting the results with the plain "sort" to ensure that everybody can update and
compare the results. [1]

My proposal was that you could
1) take the current master branch,
2) run the tests,
3) sort the results of the tests with "sort",
4) sort the checked in results with "sort",
5) verify that the sorted results from 3) and 4) are identical, and
6) create a new pull request to update the now differently sorted reference
   results.
If 5) succeeds you will have done a reasonable check that only the order of
the reference results has changed.

If all of this works you will have
a) validated that you indeed get the expected reference results before your
   fix and
b) created a pull request for reference results in a form that make the
   comparison with the results that you'll get after your fix easier.

I think that the big diff [2] that you see in the materialized reference
results right now is due to the changes introduced by your change and the
changes introduced by the different sort order.

Does this make sense?

Cheers,
Till

[1] https://issues.apache.org/jira/browse/VXQUERY-187
[2] https://github.com/apache/vxquery/pull/32/commits/131915a2bb38b06e6ef2d27a24c50201d1dab13c


Please kindly help.


[1] https://www.w3.org/TR/xpath-functions/#func-tokenize

[2]
http://riyafa.github.io/Riyafa-Abdul-Hameed--web-page/others/full_report.html

Thank you.

Yours sincerely,
Riyafa

On 16 March 2016 at 10:59, Till Westmann <[email protected]> wrote:

Hi,

I took a brief look into you change and here are a few next steps that
could
help to get a better handle on the issue:

1) One of the problems with the diff for the expected results is, that the instructions to create the diff that you find on the website are not consistent with the current reality [1]. So one good step would be to a) recreate the expected results with an unmodified checkout following
      the instructions on the website and
   b) checking that only the order has changed, and
   c) creating a PR for that.

2) Rerun the tests with your patch, categorize the failures by stack trace, and explain at least one failure in more detail on the list (with a stack trace, pointers to the code an possible explanations - if they
   come to mind). E.g. it would be good to see in the e-mail what the
system did when the creation of a sequence for fn:tokenize didn’t work.

Does this make sense?

Cheers,
Till

[1] https://issues.apache.org/jira/browse/VXQUERY-187


On 11 Mar 2016, at 18:11, Riyafa Abdul Hameed wrote:

Hi,

I tried fixing the errors from the test results and I was unable to fix some of them. You can find the full error report here[1]. The test cases
related this PR are from 13865 to 14054.
There are errors related to exception handling and since I am using the available java functions I am not sure how I could catch such errors. Also I don't seem to be matching UTF-8 strings, I tried to get the byte
array and convert to UTF-8 string, but it wouldn't work.
Related errors are: 13918 to 13921.
According to [2] I think we should convert all the UTF-8 characters as appropriate when adding to a StringBuilder in the UTF8StringPointable
class.. I am not sure how I could do that.
Also I tried converting the result of fn:tokenize to a sequence of strings
(using sequence builder) instead of a single string, but in vain.
Maybe I have understood things incorrectly. Can you please help me figure
out how I could fix these errors?

(I sent a previous mail which was not delivered because I tried to attach
the error report)

[1]

http://riyafa.github.io/Riyafa-Abdul-Hameed--web-page/others/full_report.html
[2] http://stackoverflow.com/a/5729843/3599535

Thank you.

Yours sincerely,
Riyafa

On 10 March 2016 at 14:04, Till Westmann <[email protected]> wrote:

Hi Riyafa,

I just looked at your PR [1] and realized that the diff in the results
file is very big.
I think that this might be due to a recent commit by Preston [2] that
changed the sorting of the results file a bit.
Could you take a look if that’s indeed the case and - if so - create a
new
results file with the same order that’s currently checked it?
Otherwise, could you validate, that queries that use the new functions
work correctly now?

Cheers,
Till

[1] https://github.com/apache/vxquery/pull/32/
[2]

https://github.com/apache/vxquery/commit/43852a5476ccb33bf9ee58e27468b400cc169d6a#diff-39476c050696c8ab9f59540b607ba92e




--
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: [email protected]
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>




--
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: [email protected]
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Reply via email to