Thank to Craig and Colin,

I spent all day to know the stackoverflow post and document Colin provided. 
Yeah, I know some (not all) rules of request payload, and I can use it to 
replace param. But the response deserialization is hard, you not mention in 
you answer, my goal still be parse the response receive from server. 

I need crawl 10000 users via GWT RPC, it is single-time crawl (I crawl it 
once) for my service. So, the performance is not important.

Again, I know some data from server (response2.png) I attached before, 
There is a Json, start with //OK, next is Array with 2045 elements 
(0-2044), element from 0 - 2041 is something to confusing, element 2042 is 
an array list, it is arranged in a jumble, maybe above data (element from 
0-2041) contains order of this.

If you need the exactly response payload, please reply and I public it.
Vào lúc 11:13:42 UTC+7 ngày Thứ Tư, 9 tháng 10, 2024, Craig Mitchell đã 
viết:

> I second what Colin says "*Since you're scraping anyway, consider just 
> scraping the results of the rendered page? This will likely take 
> substantially more CPU time, but ridiculously less developer time to 
> implement."*
>
> GWT RPC is not an API.  It will constantly change as the website updates.
>
> I'd recommend either using a proper API (if one exists), or an 
> off-the-shelf scraper tool.
>
> On Wednesday 9 October 2024 at 4:40:08 am UTC+11 Colin Alworth wrote:
>
>> I'd suggest reading the stream reader/write subtypes of 
>> AbstractSerializationStream to understand what all of the values are for - 
>> in short, a gwt-rpc response is a payload and a string table, and the 
>> payloads elements will reference the string table. You cannot know what the 
>> structure is for certain without seeing the original Java types being 
>> serialized, but often you can make good guesses.
>>
>> I'd also suggest reading stackoverflow posts and the like showing how to 
>> deserialize other payloads just from context - here's a post that breaks 
>> down a payload to understand its contents: 
>> https://stackoverflow.com/questions/35047102/serializing-rpc-gwt/35047887#35047887
>>
>> If you havent yet, read 
>> https://docs.google.com/document/d/1eG0YocsYYbNAtivkLtcaiEE5IOF5u4LUol8-LL0TIKU/edit
>>  
>> as well.
>>
>> In short though, your response value is _probably_ be a List of 
>> CourseMember types - knowing that class will help you. I can't easily guess 
>> more though, as the above doc says, the json array is read backwards, so 
>> the important details would be right before and after the string array - 
>> you have 1,7,2,1[...strings...] in the second image. From that I can say
>>
>> 1: if this was zero, it would be a null, since it is a positive number, 
>> read the (value - 1) entry from the string table, which is ArrayList, so: 
>> read a value of type ArrayList from the stream 
>> 7: the ArrayList has 7 items
>> 2: first item in the arraylist - as above, if this was 0, it would be 
>> null, since it is positive, read the (value - 1) entry from the string 
>> table, and decode that type, so: read a CourseMember object from the payload
>> 1: this is _probably_ the number 1 in the first field of the first 
>> CourseMember.
>> ...
>>
>> A parser continuing in this way, with knowledge of the structure of these 
>> types could be written to decode this payload. I don't know of an 
>> off-the-shelf tool that will do it for you in a truly automated way, but 
>> could consult to write one, or guide your project in implementing one by 
>> hand.
>>
>> Since you're scraping anyway, consider just scraping the results of the 
>> rendered page? This will likely take substantially more CPU time, but 
>> ridiculously less developer time to implement. 
>>
>> On Tuesday, October 8, 2024 at 12:03:30 PM UTC-5 [email protected] wrote:
>>
>>> Thank you because the detail response.
>>>
>>> I want to crawl data on a public website, I opened devtools and saw that 
>>> it was written by GWT RPC.
>>>
>>> This is the body of request I saw: 
>>>
>>> 7|0|10|
>>> https://a.b.c.d/e|5C6CDB13D0FD25B266F3C36FA7FF6ED9|a1.a2.a3.DataService|getCourseMembers|java.lang.Long/4227064769|java.lang.String/2004016611|java.util.List|20204524|java.util.Arrays$ArrayList/2507071751|20241|1|2|3|4|3|5|6|7|5|TXbrzIAAA|8|9|1|6|10|
>>>
>>> As you can see, no problem with that syntax, I can understand roughly, I 
>>> know the method is getCourseMembers. I want to build a function should 
>>> return above body, like: 
>>>                   public static String getBodyEncoded(String methodName, 
>>> ... String methodBody ...) or something similar, and return the body above 
>>> to send to server.
>>>
>>> I also want to know the last past of request syntax:
>>>                  1|2|3|4|1|5|6|7|7|8|7|9|7|10|7|11|7|12|7|13|7|14|
>>>
>>> The next is the response body. This is really the problem. A response is 
>>> very long, I put it in attached files.
>>>
>>> I saw a JsonArray with more than 2000 elements, and I cannot understand 
>>> what are they. The only thing I understand is the 2042nd element, it 
>>> contains an unorder list. Maybe some elements before contains data about 
>>> the order.
>>>
>>> I want to build a method to extract/deserialize this response.
>>>
>>> I am a newbie, if my question can be completed, can you guide me with 
>>> more details, please?
>>> Java is good, but other languages are acceptable, I still can deploy it.
>>>
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups "GWT 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-web-toolkit/a5b80301-72e3-4091-b466-0c9069bf8215n%40googlegroups.com.

Reply via email to