Hi Riyafa,
On 27 Mar 2016, at 23:50, Riyafa Abdul Hameed wrote:
I modified the pull request so that I was able to fix most of the
previous
errors bringing up kind of different errors:
Consider the function fn:tokenize. According to the defininitiion [1]
it
should return a sequence of strings
*Definition:*
fn:tokenize($input as xs:string?, $pattern as xs:string) as xs:string*
fn:tokenize( $input as xs:string?, $pattern as xs:string, $flags as
xs:string) as xs:string*
where xs: string* represents a sequence of strings. Earlier having not
read
this definition properly I returned single string with each tokenized
part
separated by a single space--all the results did not pass because the
tests
were expecting a sequence.
Now I have modified the code to return a sequence of string for the
above
function, but now most of the tests fail. I have found the reason for
this
by remote debugging the org.apache.vxquery.result.ResultUtils class:
The result tested by test suite requires the final result to be a
sequence
of strings separated by a single white space, but the string generated
in
the ResultUtils class has a sequence of strings separated by the new
line
(\n) character because of which tests on tokenize fail.
for an example consider the test (test 14015 in [2]):
fn:tokenize("The cat sat on the mat", "\s+")
The expected result is:
The cat sat on the mat
But the string generated (at ResultUtils) and printed on the console
The
cat
sat
on
the
mat
each string separated by a new line character.
For the same reason other tests also fail (Eg: 13950 in [2]).
Shall I create an issue in jira so that a fix could be made by which
instead of using a new line character to separate the values in a
sequence
when printing to the console a single whitespace would be used?
Yes, you are right, this is indeed serialization problem (i.e. a problem
of
how we serialize instances of the XQuery Data Model) and it should be
captured in a JIRA. However, I am not sure that we always want to move a
single space. I think that the issue should state that we need to find
out
where the new line is introduced and that we need to discuss/decide in
which
cases we want a single-space and in which one a new line is preferable.
Also I am not sure I understand your instruction on how to make a PR
for the
change in the materialized results. What do you mean by "checking that
only
the order has changed"? (sorry if it is a silly question)
On the website we have instructions how to generate the XQTS results.
However, the results that are currently checked in are not in the order
that
they would be in if you follow the instructions. The instructions tell
us to
sort the result using "sort", while the checked in results were sorted
using
"sort -V". While using "sort -V" creates a result that’s more sensibly
sorted, the "-V" option is not available in "sort" on all platforms
(e.g. it
is not available on OS X). So I think that we should move back to
sorting
the results with the plain "sort" to ensure that everybody can update
and
compare the results. [1]
My proposal was that you could
1) take the current master branch,
2) run the tests,
3) sort the results of the tests with "sort",
4) sort the checked in results with "sort",
5) verify that the sorted results from 3) and 4) are identical, and
6) create a new pull request to update the now differently sorted
reference
results.
If 5) succeeds you will have done a reasonable check that only the order
of
the reference results has changed.
If all of this works you will have
a) validated that you indeed get the expected reference results before
your
fix and
b) created a pull request for reference results in a form that make the
comparison with the results that you'll get after your fix easier.
I think that the big diff [2] that you see in the materialized reference
results right now is due to the changes introduced by your change and
the
changes introduced by the different sort order.
Does this make sense?
Cheers,
Till
[1] https://issues.apache.org/jira/browse/VXQUERY-187
[2]
https://github.com/apache/vxquery/pull/32/commits/131915a2bb38b06e6ef2d27a24c50201d1dab13c
Please kindly help.
[1] https://www.w3.org/TR/xpath-functions/#func-tokenize
[2]
http://riyafa.github.io/Riyafa-Abdul-Hameed--web-page/others/full_report.html
Thank you.
Yours sincerely,
Riyafa
On 16 March 2016 at 10:59, Till Westmann <[email protected]> wrote:
Hi,
I took a brief look into you change and here are a few next steps
that
could
help to get a better handle on the issue:
1) One of the problems with the diff for the expected results is,
that the
instructions to create the diff that you find on the website are
not
consistent with the current reality [1]. So one good step would be
to
a) recreate the expected results with an unmodified checkout
following
the instructions on the website and
b) checking that only the order has changed, and
c) creating a PR for that.
2) Rerun the tests with your patch, categorize the failures by stack
trace,
and explain at least one failure in more detail on the list (with
a
stack trace, pointers to the code an possible explanations - if
they
come to mind). E.g. it would be good to see in the e-mail what the
system did when the creation of a sequence for fn:tokenize
didn’t work.
Does this make sense?
Cheers,
Till
[1] https://issues.apache.org/jira/browse/VXQUERY-187
On 11 Mar 2016, at 18:11, Riyafa Abdul Hameed wrote:
Hi,
I tried fixing the errors from the test results and I was unable to
fix
some of them. You can find the full error report here[1]. The test
cases
related this PR are from 13865 to 14054.
There are errors related to exception handling and since I am using
the
available java functions I am not sure how I could catch such
errors.
Also I don't seem to be matching UTF-8 strings, I tried to get the
byte
array and convert to UTF-8 string, but it wouldn't work.
Related errors are: 13918 to 13921.
According to [2] I think we should convert all the UTF-8 characters
as
appropriate when adding to a StringBuilder in the
UTF8StringPointable
class.. I am not sure how I could do that.
Also I tried converting the result of fn:tokenize to a sequence of
strings
(using sequence builder) instead of a single string, but in vain.
Maybe I have understood things incorrectly. Can you please help me
figure
out how I could fix these errors?
(I sent a previous mail which was not delivered because I tried to
attach
the error report)
[1]
http://riyafa.github.io/Riyafa-Abdul-Hameed--web-page/others/full_report.html
[2] http://stackoverflow.com/a/5729843/3599535
Thank you.
Yours sincerely,
Riyafa
On 10 March 2016 at 14:04, Till Westmann <[email protected]> wrote:
Hi Riyafa,
I just looked at your PR [1] and realized that the diff in the
results
file is very big.
I think that this might be due to a recent commit by Preston [2]
that
changed the sorting of the results file a bit.
Could you take a look if that’s indeed the case and - if so -
create a
new
results file with the same order that’s currently checked it?
Otherwise, could you validate, that queries that use the new
functions
work correctly now?
Cheers,
Till
[1] https://github.com/apache/vxquery/pull/32/
[2]
https://github.com/apache/vxquery/commit/43852a5476ccb33bf9ee58e27468b400cc169d6a#diff-39476c050696c8ab9f59540b607ba92e
--
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa
Email: [email protected]
Website: https://riyafa.wordpress.com/
<http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>
--
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa
Email: [email protected]
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>