Re: [casper] SNAP FPGA data endianness and networking

2020-08-19 Thread Jack Hickish
On a more serious note, I would generally add that in any FPGA system - in
particularly those that output data over Ethernet - IMO there should be
some mechanism to replace the stream with known test vectors. Inbuilt test
vectors make finding bugs (like endianness issues) down the road so much
easier, and I'm constantly surprised at how many large designs don't
include them.

And as I write this I'm realising that none of the Casper tutorials have
test vectors. Whoops!

Cheers
Jack

On Wed, 19 Aug 2020, 6:44 am Nitish Ragoomundun, <
nitish.ragoomun...@gmail.com> wrote:

>
> :)
> Thank you all for this small debate.
> Our sampling rates are in the hundreds of MHz but we have many many
> dual-polarised antennas, so we will rather go with the pragmatic solution
> to be on the safe side. All the processing nodes are little-endian and the
> final products that we might be sharing will be FITS files containing
> correlation matrices. But we understand the importance of documenting the
> system and the data products very clearly. After this discussion we intend
> to make this byte swapping into a simple modular system so that if we are
> sharing the design the latter part can be removed easily to re-compile for
> the big-endian system.
>
> Many thanks.
> Nitish
>
>
> On Tue, Aug 18, 2020 at 10:13 PM David MacMahon 
> wrote:
>
>> Is it April already? :) :) :)
>>
>> On Aug 18, 2020, at 10:43, Jack Hickish  wrote:
>>
>> There is, of course, always the compromise option of using half
>> network-endianness and half little-endianness. For example, all positive
>> numbers could be encoded with big-endian and negative numbers could be
>> encoded little-endian. This would incur a similar overhead on both little-
>> and big-endian CPU platforms, and would also be easily parallelizable on a
>> GPU decoder.
>>
>> Yours,
>>
>> Nathan Poe
>>
>> On Tue, 18 Aug 2020 at 17:18, James Smith  wrote:
>>
>>> Hi Dave,
>>>
>>> Yes of course! Though it makes little sense IMO to do the conversion on
>>> the host CPU, as GPUs are pretty well-equipped to do this operation pretty
>>> quickly if the need arises.
>>>
>>> In some cases being pragmatic is important - if your instrument is
>>> small, for example, and you don't have any user-supplied equipment. In the
>>> MeerKAT case however, we specifically cater for having third-party
>>> computers connecting to our network, then some sort of standards-compliance
>>> comes in very handy. Though most of our data is 8- (or 10-) bit anyway so
>>> byte order makes little difference.
>>>
>>> Regards,
>>> James
>>>
>>>
>>> On Tue, Aug 18, 2020 at 3:30 PM David MacMahon 
>>> wrote:
>>>
 I guess I’m going to play angels’s advocate and suggest the pragmatic
 over the dogmatic. :)

 Some standards mandate network byte order, aka big endian, but if
 you’re not constrained in that way and you know that the data will be
 processed downstream by a little-endian system for the foreseeable future,
 then I think it makes sense to send it out in little-endian form. You can
 use `le32toh()` etc in the receiving code to make it host-endian agnostic,
 but on little-endian systems that is optimized away to nothing. Sure, that
 might only be saving 1 CPU cycle per value, but when you’re dealing with
 billions of values per second that can start adding up!

 Of course, the packet format should be documented regardless of which
 endianess is used. Future users will thank you.

 Cheers,
 Dave

 On Aug 18, 2020, at 07:21, James Smith  wrote:

 
 Hello Nitish,

 So I'm going to play devil's advocate and say that while you could do
 the byte swapping in the FPGA, it would be morally wrong ;-)

 Ideally, all data that goes out on a network will be network order, and
 you use the ntohl or htohs functions to get it in host format. That way the
 code stays more portable - if you one day find yourself on a big-endian
 system, it would work without modification.
 (https://en.wikipedia.org/wiki/Endianness#Networking)

 Sometimes for performance reasons you may have to make these kinds of
 compromises, and if you do you should document them well! But most modern
 servers should have no issue with 10Gb/s datarates. You could probably even
 do the swaps in the GPUs using Nvidia's primitives.

 Regards,
 James




 On Tue, Aug 18, 2020 at 1:28 PM Nitish Ragoomundun <
 nitish.ragoomun...@gmail.com> wrote:

> Hi,
>
> Thanks a lot Jack. It makes sense.
> And thank you very much for the note on the 2x32-bit pair. It is
> exactly how our data is formatted.
> Ok, we will go with an FPGA correction instead of a CPU byteswap. I am
> guessing it will be faster this way.
>
> Thanks again.
> Cheers
> Nitish
>
>
> On Tue, Aug 18, 2020 at 4:47 PM Jack Hickish 
> wrote:
>
>> Hi Nitish,
>>

Re: [casper] SNAP FPGA data endianness and networking

2020-08-18 Thread Nitish Ragoomundun
:)
Thank you all for this small debate.
Our sampling rates are in the hundreds of MHz but we have many many
dual-polarised antennas, so we will rather go with the pragmatic solution
to be on the safe side. All the processing nodes are little-endian and the
final products that we might be sharing will be FITS files containing
correlation matrices. But we understand the importance of documenting the
system and the data products very clearly. After this discussion we intend
to make this byte swapping into a simple modular system so that if we are
sharing the design the latter part can be removed easily to re-compile for
the big-endian system.

Many thanks.
Nitish


On Tue, Aug 18, 2020 at 10:13 PM David MacMahon  wrote:

> Is it April already? :) :) :)
>
> On Aug 18, 2020, at 10:43, Jack Hickish  wrote:
>
> There is, of course, always the compromise option of using half
> network-endianness and half little-endianness. For example, all positive
> numbers could be encoded with big-endian and negative numbers could be
> encoded little-endian. This would incur a similar overhead on both little-
> and big-endian CPU platforms, and would also be easily parallelizable on a
> GPU decoder.
>
> Yours,
>
> Nathan Poe
>
> On Tue, 18 Aug 2020 at 17:18, James Smith  wrote:
>
>> Hi Dave,
>>
>> Yes of course! Though it makes little sense IMO to do the conversion on
>> the host CPU, as GPUs are pretty well-equipped to do this operation pretty
>> quickly if the need arises.
>>
>> In some cases being pragmatic is important - if your instrument is small,
>> for example, and you don't have any user-supplied equipment. In the MeerKAT
>> case however, we specifically cater for having third-party computers
>> connecting to our network, then some sort of standards-compliance comes in
>> very handy. Though most of our data is 8- (or 10-) bit anyway so byte order
>> makes little difference.
>>
>> Regards,
>> James
>>
>>
>> On Tue, Aug 18, 2020 at 3:30 PM David MacMahon 
>> wrote:
>>
>>> I guess I’m going to play angels’s advocate and suggest the pragmatic
>>> over the dogmatic. :)
>>>
>>> Some standards mandate network byte order, aka big endian, but if you’re
>>> not constrained in that way and you know that the data will be processed
>>> downstream by a little-endian system for the foreseeable future, then I
>>> think it makes sense to send it out in little-endian form. You can use
>>> `le32toh()` etc in the receiving code to make it host-endian agnostic, but
>>> on little-endian systems that is optimized away to nothing. Sure, that
>>> might only be saving 1 CPU cycle per value, but when you’re dealing with
>>> billions of values per second that can start adding up!
>>>
>>> Of course, the packet format should be documented regardless of which
>>> endianess is used. Future users will thank you.
>>>
>>> Cheers,
>>> Dave
>>>
>>> On Aug 18, 2020, at 07:21, James Smith  wrote:
>>>
>>> 
>>> Hello Nitish,
>>>
>>> So I'm going to play devil's advocate and say that while you could do
>>> the byte swapping in the FPGA, it would be morally wrong ;-)
>>>
>>> Ideally, all data that goes out on a network will be network order, and
>>> you use the ntohl or htohs functions to get it in host format. That way the
>>> code stays more portable - if you one day find yourself on a big-endian
>>> system, it would work without modification.
>>> (https://en.wikipedia.org/wiki/Endianness#Networking)
>>>
>>> Sometimes for performance reasons you may have to make these kinds of
>>> compromises, and if you do you should document them well! But most modern
>>> servers should have no issue with 10Gb/s datarates. You could probably even
>>> do the swaps in the GPUs using Nvidia's primitives.
>>>
>>> Regards,
>>> James
>>>
>>>
>>>
>>>
>>> On Tue, Aug 18, 2020 at 1:28 PM Nitish Ragoomundun <
>>> nitish.ragoomun...@gmail.com> wrote:
>>>
 Hi,

 Thanks a lot Jack. It makes sense.
 And thank you very much for the note on the 2x32-bit pair. It is
 exactly how our data is formatted.
 Ok, we will go with an FPGA correction instead of a CPU byteswap. I am
 guessing it will be faster this way.

 Thanks again.
 Cheers
 Nitish


 On Tue, Aug 18, 2020 at 4:47 PM Jack Hickish 
 wrote:

> Hi Nitish,
>
> To try and answer your first question without adding confusion --
>
> If you send a UFix64_0 value into the 10GbE block, you will need to
> interpret it on the other end via an appropriate 64-bit byte swap if your
> CPU is little-endian.
> If you send a 64-bit input into the 10GbE block where the most
> significant 32 bits are the value A, and the least significant bits are
> value B, you should interpret the 64-bits  on your little endian CPU as 
> the
> struct
>
> typedef struct pkt {
>   uint32_t A;
>   uint32_t B;
> } pkt;
>
> where each of the A and B will need byteswapping before you use them.
>
> To answer your second question 

Re: [casper] SNAP FPGA data endianness and networking

2020-08-18 Thread David MacMahon
Is it April already? :) :) :)

> On Aug 18, 2020, at 10:43, Jack Hickish  wrote:
> 
> There is, of course, always the compromise option of using half 
> network-endianness and half little-endianness. For example, all positive 
> numbers could be encoded with big-endian and negative numbers could be 
> encoded little-endian. This would incur a similar overhead on both little- 
> and big-endian CPU platforms, and would also be easily parallelizable on a 
> GPU decoder.
> 
> Yours,
> 
> Nathan Poe
> 
> On Tue, 18 Aug 2020 at 17:18, James Smith  > wrote:
> Hi Dave,
> 
> Yes of course! Though it makes little sense IMO to do the conversion on the 
> host CPU, as GPUs are pretty well-equipped to do this operation pretty 
> quickly if the need arises.
> 
> In some cases being pragmatic is important - if your instrument is small, for 
> example, and you don't have any user-supplied equipment. In the MeerKAT case 
> however, we specifically cater for having third-party computers connecting to 
> our network, then some sort of standards-compliance comes in very handy. 
> Though most of our data is 8- (or 10-) bit anyway so byte order makes little 
> difference.
> 
> Regards,
> James
> 
> 
> On Tue, Aug 18, 2020 at 3:30 PM David MacMahon  > wrote:
> I guess I’m going to play angels’s advocate and suggest the pragmatic over 
> the dogmatic. :)
> 
> Some standards mandate network byte order, aka big endian, but if you’re not 
> constrained in that way and you know that the data will be processed 
> downstream by a little-endian system for the foreseeable future, then I think 
> it makes sense to send it out in little-endian form. You can use `le32toh()` 
> etc in the receiving code to make it host-endian agnostic, but on 
> little-endian systems that is optimized away to nothing. Sure, that might 
> only be saving 1 CPU cycle per value, but when you’re dealing with billions 
> of values per second that can start adding up!
> 
> Of course, the packet format should be documented regardless of which 
> endianess is used. Future users will thank you.
> 
> Cheers,
> Dave
> 
>> On Aug 18, 2020, at 07:21, James Smith > > wrote:
>> 
>> 
>> Hello Nitish,
>> 
>> So I'm going to play devil's advocate and say that while you could do the 
>> byte swapping in the FPGA, it would be morally wrong ;-)
>> 
>> Ideally, all data that goes out on a network will be network order, and you 
>> use the ntohl or htohs functions to get it in host format. That way the code 
>> stays more portable - if you one day find yourself on a big-endian system, 
>> it would work without modification.
>> (https://en.wikipedia.org/wiki/Endianness#Networking 
>> )
>> 
>> Sometimes for performance reasons you may have to make these kinds of 
>> compromises, and if you do you should document them well! But most modern 
>> servers should have no issue with 10Gb/s datarates. You could probably even 
>> do the swaps in the GPUs using Nvidia's primitives.
>> 
>> Regards,
>> James
>> 
>> 
>> 
>> 
>> On Tue, Aug 18, 2020 at 1:28 PM Nitish Ragoomundun 
>> mailto:nitish.ragoomun...@gmail.com>> wrote:
>> Hi,
>> 
>> Thanks a lot Jack. It makes sense.
>> And thank you very much for the note on the 2x32-bit pair. It is exactly how 
>> our data is formatted.
>> Ok, we will go with an FPGA correction instead of a CPU byteswap. I am 
>> guessing it will be faster this way.
>> 
>> Thanks again.
>> Cheers
>> Nitish
>> 
>> 
>> On Tue, Aug 18, 2020 at 4:47 PM Jack Hickish > > wrote:
>> Hi Nitish,
>> 
>> To try and answer your first question without adding confusion --
>> 
>> If you send a UFix64_0 value into the 10GbE block, you will need to 
>> interpret it on the other end via an appropriate 64-bit byte swap if your 
>> CPU is little-endian.
>> If you send a 64-bit input into the 10GbE block where the most significant 
>> 32 bits are the value A, and the least significant bits are value B, you 
>> should interpret the 64-bits  on your little endian CPU as the struct
>> 
>> typedef struct pkt {
>>   uint32_t A;
>>   uint32_t B;
>> } pkt;
>> 
>> where each of the A and B will need byteswapping before you use them.
>> 
>> To answer your second question --
>> Yes, you can absolutely flip the endianness on the FPGA prior to 
>> transmission so you don't have to byteswap on your CPU. You can either do 
>> this with a bus-expand + bus-create blocks, using the first to split your 
>> words into bytes, and then flipping them before concatenating. The Xilinx 
>> "bitbasher" block would also be good for this, using the Verilog (for a 
>> 64-bit input):
>> 
>> out = {in[7:0], in[15:8], in[23:16], in[31:24], in[39:32], in[47:40], 
>> in[55:48], in[63:48]}
>> 
>> If your 64 bit data streams are not made up of 64-bit integers (eg, they are 
>> pairs of 32-bit integers) then you should flip the 4 bytes of each value 
>> individually, 

Re: [casper] SNAP FPGA data endianness and networking

2020-08-18 Thread Jack Hickish
There is, of course, always the compromise option of using half
network-endianness and half little-endianness. For example, all positive
numbers could be encoded with big-endian and negative numbers could be
encoded little-endian. This would incur a similar overhead on both little-
and big-endian CPU platforms, and would also be easily parallelizable on a
GPU decoder.

Yours,

Nathan Poe

On Tue, 18 Aug 2020 at 17:18, James Smith  wrote:

> Hi Dave,
>
> Yes of course! Though it makes little sense IMO to do the conversion on
> the host CPU, as GPUs are pretty well-equipped to do this operation pretty
> quickly if the need arises.
>
> In some cases being pragmatic is important - if your instrument is small,
> for example, and you don't have any user-supplied equipment. In the MeerKAT
> case however, we specifically cater for having third-party computers
> connecting to our network, then some sort of standards-compliance comes in
> very handy. Though most of our data is 8- (or 10-) bit anyway so byte order
> makes little difference.
>
> Regards,
> James
>
>
> On Tue, Aug 18, 2020 at 3:30 PM David MacMahon 
> wrote:
>
>> I guess I’m going to play angels’s advocate and suggest the pragmatic
>> over the dogmatic. :)
>>
>> Some standards mandate network byte order, aka big endian, but if you’re
>> not constrained in that way and you know that the data will be processed
>> downstream by a little-endian system for the foreseeable future, then I
>> think it makes sense to send it out in little-endian form. You can use
>> `le32toh()` etc in the receiving code to make it host-endian agnostic, but
>> on little-endian systems that is optimized away to nothing. Sure, that
>> might only be saving 1 CPU cycle per value, but when you’re dealing with
>> billions of values per second that can start adding up!
>>
>> Of course, the packet format should be documented regardless of which
>> endianess is used. Future users will thank you.
>>
>> Cheers,
>> Dave
>>
>> On Aug 18, 2020, at 07:21, James Smith  wrote:
>>
>> 
>> Hello Nitish,
>>
>> So I'm going to play devil's advocate and say that while you could do the
>> byte swapping in the FPGA, it would be morally wrong ;-)
>>
>> Ideally, all data that goes out on a network will be network order, and
>> you use the ntohl or htohs functions to get it in host format. That way the
>> code stays more portable - if you one day find yourself on a big-endian
>> system, it would work without modification.
>> (https://en.wikipedia.org/wiki/Endianness#Networking)
>>
>> Sometimes for performance reasons you may have to make these kinds of
>> compromises, and if you do you should document them well! But most modern
>> servers should have no issue with 10Gb/s datarates. You could probably even
>> do the swaps in the GPUs using Nvidia's primitives.
>>
>> Regards,
>> James
>>
>>
>>
>>
>> On Tue, Aug 18, 2020 at 1:28 PM Nitish Ragoomundun <
>> nitish.ragoomun...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Thanks a lot Jack. It makes sense.
>>> And thank you very much for the note on the 2x32-bit pair. It is exactly
>>> how our data is formatted.
>>> Ok, we will go with an FPGA correction instead of a CPU byteswap. I am
>>> guessing it will be faster this way.
>>>
>>> Thanks again.
>>> Cheers
>>> Nitish
>>>
>>>
>>> On Tue, Aug 18, 2020 at 4:47 PM Jack Hickish 
>>> wrote:
>>>
 Hi Nitish,

 To try and answer your first question without adding confusion --

 If you send a UFix64_0 value into the 10GbE block, you will need to
 interpret it on the other end via an appropriate 64-bit byte swap if your
 CPU is little-endian.
 If you send a 64-bit input into the 10GbE block where the most
 significant 32 bits are the value A, and the least significant bits are
 value B, you should interpret the 64-bits  on your little endian CPU as the
 struct

 typedef struct pkt {
   uint32_t A;
   uint32_t B;
 } pkt;

 where each of the A and B will need byteswapping before you use them.

 To answer your second question --
 Yes, you can absolutely flip the endianness on the FPGA prior to
 transmission so you don't have to byteswap on your CPU. You can either do
 this with a bus-expand + bus-create blocks, using the first to split your
 words into bytes, and then flipping them before concatenating. The Xilinx
 "bitbasher" block would also be good for this, using the Verilog (for a
 64-bit input):

 out = {in[7:0], in[15:8], in[23:16], in[31:24], in[39:32], in[47:40],
 in[55:48], in[63:48]}

 If your 64 bit data streams are not made up of 64-bit integers (eg,
 they are pairs of 32-bit integers) then you should flip the 4 bytes of each
 value individually, but leave the ordering of the two values within the 64
 bits unchanged.

 Hopefully that makes sense

 Jack


 On Tue, 18 Aug 2020 at 13:28, Nitish Ragoomundun <
 nitish.ragoomun...@gmail.com> wrote:


Re: [casper] SNAP FPGA data endianness and networking

2020-08-18 Thread James Smith
Hi Dave,

Yes of course! Though it makes little sense IMO to do the conversion on the
host CPU, as GPUs are pretty well-equipped to do this operation pretty
quickly if the need arises.

In some cases being pragmatic is important - if your instrument is small,
for example, and you don't have any user-supplied equipment. In the MeerKAT
case however, we specifically cater for having third-party computers
connecting to our network, then some sort of standards-compliance comes in
very handy. Though most of our data is 8- (or 10-) bit anyway so byte order
makes little difference.

Regards,
James


On Tue, Aug 18, 2020 at 3:30 PM David MacMahon  wrote:

> I guess I’m going to play angels’s advocate and suggest the pragmatic over
> the dogmatic. :)
>
> Some standards mandate network byte order, aka big endian, but if you’re
> not constrained in that way and you know that the data will be processed
> downstream by a little-endian system for the foreseeable future, then I
> think it makes sense to send it out in little-endian form. You can use
> `le32toh()` etc in the receiving code to make it host-endian agnostic, but
> on little-endian systems that is optimized away to nothing. Sure, that
> might only be saving 1 CPU cycle per value, but when you’re dealing with
> billions of values per second that can start adding up!
>
> Of course, the packet format should be documented regardless of which
> endianess is used. Future users will thank you.
>
> Cheers,
> Dave
>
> On Aug 18, 2020, at 07:21, James Smith  wrote:
>
> 
> Hello Nitish,
>
> So I'm going to play devil's advocate and say that while you could do the
> byte swapping in the FPGA, it would be morally wrong ;-)
>
> Ideally, all data that goes out on a network will be network order, and
> you use the ntohl or htohs functions to get it in host format. That way the
> code stays more portable - if you one day find yourself on a big-endian
> system, it would work without modification.
> (https://en.wikipedia.org/wiki/Endianness#Networking
> 
> )
>
> Sometimes for performance reasons you may have to make these kinds of
> compromises, and if you do you should document them well! But most modern
> servers should have no issue with 10Gb/s datarates. You could probably even
> do the swaps in the GPUs using Nvidia's primitives.
>
> Regards,
> James
>
>
>
>
> On Tue, Aug 18, 2020 at 1:28 PM Nitish Ragoomundun <
> nitish.ragoomun...@gmail.com> wrote:
>
>> Hi,
>>
>> Thanks a lot Jack. It makes sense.
>> And thank you very much for the note on the 2x32-bit pair. It is exactly
>> how our data is formatted.
>> Ok, we will go with an FPGA correction instead of a CPU byteswap. I am
>> guessing it will be faster this way.
>>
>> Thanks again.
>> Cheers
>> Nitish
>>
>>
>> On Tue, Aug 18, 2020 at 4:47 PM Jack Hickish 
>> wrote:
>>
>>> Hi Nitish,
>>>
>>> To try and answer your first question without adding confusion --
>>>
>>> If you send a UFix64_0 value into the 10GbE block, you will need to
>>> interpret it on the other end via an appropriate 64-bit byte swap if your
>>> CPU is little-endian.
>>> If you send a 64-bit input into the 10GbE block where the most
>>> significant 32 bits are the value A, and the least significant bits are
>>> value B, you should interpret the 64-bits  on your little endian CPU as the
>>> struct
>>>
>>> typedef struct pkt {
>>>   uint32_t A;
>>>   uint32_t B;
>>> } pkt;
>>>
>>> where each of the A and B will need byteswapping before you use them.
>>>
>>> To answer your second question --
>>> Yes, you can absolutely flip the endianness on the FPGA prior to
>>> transmission so you don't have to byteswap on your CPU. You can either do
>>> this with a bus-expand + bus-create blocks, using the first to split your
>>> words into bytes, and then flipping them before concatenating. The Xilinx
>>> "bitbasher" block would also be good for this, using the Verilog (for a
>>> 64-bit input):
>>>
>>> out = {in[7:0], in[15:8], in[23:16], in[31:24], in[39:32], in[47:40],
>>> in[55:48], in[63:48]}
>>>
>>> If your 64 bit data streams are not made up of 64-bit integers (eg, they
>>> are pairs of 32-bit integers) then you should flip the 4 bytes of each
>>> value individually, but leave the ordering of the two values within the 64
>>> bits unchanged.
>>>
>>> Hopefully that makes sense
>>>
>>> Jack
>>>
>>>
>>> On Tue, 18 Aug 2020 at 13:28, Nitish Ragoomundun <
>>> nitish.ragoomun...@gmail.com> wrote:
>>>

 Hello,

 We are setting up the digital back-end of a low-frequency telescope
 consisting of SNAP boards and GPUs. The SNAP boards packetize the data and
 send to the GPU processing nodes via 10 GbE links. We are currently
 programming the packetizer/depacketizer.
 I have a few questions about the 10gbe yellow blocks and endianness. We
 observed from the tutorials that the data stored in bram is big-endian. I
 would like to know how the data is handled by the 10gbe and in what 

Re: [casper] SNAP FPGA data endianness and networking

2020-08-18 Thread David MacMahon
I guess I’m going to play angels’s advocate and suggest the pragmatic over the 
dogmatic. :)

Some standards mandate network byte order, aka big endian, but if you’re not 
constrained in that way and you know that the data will be processed downstream 
by a little-endian system for the foreseeable future, then I think it makes 
sense to send it out in little-endian form. You can use `le32toh()` etc in the 
receiving code to make it host-endian agnostic, but on little-endian systems 
that is optimized away to nothing. Sure, that might only be saving 1 CPU cycle 
per value, but when you’re dealing with billions of values per second that can 
start adding up!

Of course, the packet format should be documented regardless of which endianess 
is used. Future users will thank you.

Cheers,
Dave

> On Aug 18, 2020, at 07:21, James Smith  wrote:
> 
> 
> Hello Nitish,
> 
> So I'm going to play devil's advocate and say that while you could do the 
> byte swapping in the FPGA, it would be morally wrong ;-)
> 
> Ideally, all data that goes out on a network will be network order, and you 
> use the ntohl or htohs functions to get it in host format. That way the code 
> stays more portable - if you one day find yourself on a big-endian system, it 
> would work without modification.
> (https://en.wikipedia.org/wiki/Endianness#Networking)
> 
> Sometimes for performance reasons you may have to make these kinds of 
> compromises, and if you do you should document them well! But most modern 
> servers should have no issue with 10Gb/s datarates. You could probably even 
> do the swaps in the GPUs using Nvidia's primitives.
> 
> Regards,
> James
> 
> 
> 
> 
>> On Tue, Aug 18, 2020 at 1:28 PM Nitish Ragoomundun 
>>  wrote:
>> Hi,
>> 
>> Thanks a lot Jack. It makes sense.
>> And thank you very much for the note on the 2x32-bit pair. It is exactly how 
>> our data is formatted.
>> Ok, we will go with an FPGA correction instead of a CPU byteswap. I am 
>> guessing it will be faster this way.
>> 
>> Thanks again.
>> Cheers
>> Nitish
>> 
>> 
>>> On Tue, Aug 18, 2020 at 4:47 PM Jack Hickish  wrote:
>>> Hi Nitish,
>>> 
>>> To try and answer your first question without adding confusion --
>>> 
>>> If you send a UFix64_0 value into the 10GbE block, you will need to 
>>> interpret it on the other end via an appropriate 64-bit byte swap if your 
>>> CPU is little-endian.
>>> If you send a 64-bit input into the 10GbE block where the most significant 
>>> 32 bits are the value A, and the least significant bits are value B, you 
>>> should interpret the 64-bits  on your little endian CPU as the struct
>>> 
>>> typedef struct pkt {
>>>   uint32_t A;
>>>   uint32_t B;
>>> } pkt;
>>> 
>>> where each of the A and B will need byteswapping before you use them.
>>> 
>>> To answer your second question --
>>> Yes, you can absolutely flip the endianness on the FPGA prior to 
>>> transmission so you don't have to byteswap on your CPU. You can either do 
>>> this with a bus-expand + bus-create blocks, using the first to split your 
>>> words into bytes, and then flipping them before concatenating. The Xilinx 
>>> "bitbasher" block would also be good for this, using the Verilog (for a 
>>> 64-bit input):
>>> 
>>> out = {in[7:0], in[15:8], in[23:16], in[31:24], in[39:32], in[47:40], 
>>> in[55:48], in[63:48]}
>>> 
>>> If your 64 bit data streams are not made up of 64-bit integers (eg, they 
>>> are pairs of 32-bit integers) then you should flip the 4 bytes of each 
>>> value individually, but leave the ordering of the two values within the 64 
>>> bits unchanged.
>>> 
>>> Hopefully that makes sense
>>> 
>>> Jack
>>> 
>>> 
 On Tue, 18 Aug 2020 at 13:28, Nitish Ragoomundun 
  wrote:
 
 Hello,
 
 We are setting up the digital back-end of a low-frequency telescope 
 consisting of SNAP boards and GPUs. The SNAP boards packetize the data and 
 send to the GPU processing nodes via 10 GbE links. We are currently 
 programming the packetizer/depacketizer.
 I have a few questions about the 10gbe yellow blocks and endianness. We 
 observed from the tutorials that the data stored in bram is big-endian. I 
 would like to know how the data is handled by the 10gbe and in what form 
 is it sent over the network.
 Our depacketizers run on Intel processors, which are little-endian. We are 
 aware that network byte order is big-endian, but we noticed that integer 
 data can be sent from one Intel machine to another via network without 
 ever calling ntohl( ) or htonl( ) and the data was preserved. So, we would 
 like to know if we need to correct the endianness when receiving the data 
 from the SNAP.
 
 If we need to perform this correction, is there a way we could possibly 
 correct the endianness on the FPGA itself before input to the 10gbe block?
 
 Thanks,
 Nitish
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 

Re: [casper] SNAP FPGA data endianness and networking

2020-08-18 Thread James Smith
Hello Nitish,

So I'm going to play devil's advocate and say that while you could do the
byte swapping in the FPGA, it would be morally wrong ;-)

Ideally, all data that goes out on a network will be network order, and you
use the ntohl or htohs functions to get it in host format. That way the
code stays more portable - if you one day find yourself on a big-endian
system, it would work without modification.
(https://en.wikipedia.org/wiki/Endianness#Networking)

Sometimes for performance reasons you may have to make these kinds of
compromises, and if you do you should document them well! But most modern
servers should have no issue with 10Gb/s datarates. You could probably even
do the swaps in the GPUs using Nvidia's primitives.

Regards,
James




On Tue, Aug 18, 2020 at 1:28 PM Nitish Ragoomundun <
nitish.ragoomun...@gmail.com> wrote:

> Hi,
>
> Thanks a lot Jack. It makes sense.
> And thank you very much for the note on the 2x32-bit pair. It is exactly
> how our data is formatted.
> Ok, we will go with an FPGA correction instead of a CPU byteswap. I am
> guessing it will be faster this way.
>
> Thanks again.
> Cheers
> Nitish
>
>
> On Tue, Aug 18, 2020 at 4:47 PM Jack Hickish 
> wrote:
>
>> Hi Nitish,
>>
>> To try and answer your first question without adding confusion --
>>
>> If you send a UFix64_0 value into the 10GbE block, you will need to
>> interpret it on the other end via an appropriate 64-bit byte swap if your
>> CPU is little-endian.
>> If you send a 64-bit input into the 10GbE block where the most
>> significant 32 bits are the value A, and the least significant bits are
>> value B, you should interpret the 64-bits  on your little endian CPU as the
>> struct
>>
>> typedef struct pkt {
>>   uint32_t A;
>>   uint32_t B;
>> } pkt;
>>
>> where each of the A and B will need byteswapping before you use them.
>>
>> To answer your second question --
>> Yes, you can absolutely flip the endianness on the FPGA prior to
>> transmission so you don't have to byteswap on your CPU. You can either do
>> this with a bus-expand + bus-create blocks, using the first to split your
>> words into bytes, and then flipping them before concatenating. The Xilinx
>> "bitbasher" block would also be good for this, using the Verilog (for a
>> 64-bit input):
>>
>> out = {in[7:0], in[15:8], in[23:16], in[31:24], in[39:32], in[47:40],
>> in[55:48], in[63:48]}
>>
>> If your 64 bit data streams are not made up of 64-bit integers (eg, they
>> are pairs of 32-bit integers) then you should flip the 4 bytes of each
>> value individually, but leave the ordering of the two values within the 64
>> bits unchanged.
>>
>> Hopefully that makes sense
>>
>> Jack
>>
>>
>> On Tue, 18 Aug 2020 at 13:28, Nitish Ragoomundun <
>> nitish.ragoomun...@gmail.com> wrote:
>>
>>>
>>> Hello,
>>>
>>> We are setting up the digital back-end of a low-frequency telescope
>>> consisting of SNAP boards and GPUs. The SNAP boards packetize the data and
>>> send to the GPU processing nodes via 10 GbE links. We are currently
>>> programming the packetizer/depacketizer.
>>> I have a few questions about the 10gbe yellow blocks and endianness. We
>>> observed from the tutorials that the data stored in bram is big-endian. I
>>> would like to know how the data is handled by the 10gbe and in what form is
>>> it sent over the network.
>>> Our depacketizers run on Intel processors, which are little-endian. We
>>> are aware that network byte order is big-endian, but we noticed that
>>> integer data can be sent from one Intel machine to another via network
>>> without ever calling ntohl( ) or htonl( ) and the data was preserved. So,
>>> we would like to know if we need to correct the endianness when receiving
>>> the data from the SNAP.
>>>
>>> If we need to perform this correction, is there a way we could possibly
>>> correct the endianness on the FPGA itself before input to the 10gbe block?
>>>
>>> Thanks,
>>> Nitish
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "casper@lists.berkeley.edu" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to casper+unsubscr...@lists.berkeley.edu.
>>> To view this discussion on the web visit
>>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAC6X4cOZhVBUvUfs1phQ2csuRnewowZkQ8PzjjBU62LUa0js%3Dw%40mail.gmail.com
>>> 
>>> .
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to casper+unsubscr...@lists.berkeley.edu.
>> To view this discussion on the web visit
>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG1GKSkaEebTXH0X6xiDt6DLHiEy7WUW5xb%3D7w%2BYUJHB3GB7-w%40mail.gmail.com
>> 

Re: [casper] SNAP FPGA data endianness and networking

2020-08-18 Thread Nitish Ragoomundun
Hi,

Thanks a lot Jack. It makes sense.
And thank you very much for the note on the 2x32-bit pair. It is exactly
how our data is formatted.
Ok, we will go with an FPGA correction instead of a CPU byteswap. I am
guessing it will be faster this way.

Thanks again.
Cheers
Nitish


On Tue, Aug 18, 2020 at 4:47 PM Jack Hickish  wrote:

> Hi Nitish,
>
> To try and answer your first question without adding confusion --
>
> If you send a UFix64_0 value into the 10GbE block, you will need to
> interpret it on the other end via an appropriate 64-bit byte swap if your
> CPU is little-endian.
> If you send a 64-bit input into the 10GbE block where the most significant
> 32 bits are the value A, and the least significant bits are value B, you
> should interpret the 64-bits  on your little endian CPU as the struct
>
> typedef struct pkt {
>   uint32_t A;
>   uint32_t B;
> } pkt;
>
> where each of the A and B will need byteswapping before you use them.
>
> To answer your second question --
> Yes, you can absolutely flip the endianness on the FPGA prior to
> transmission so you don't have to byteswap on your CPU. You can either do
> this with a bus-expand + bus-create blocks, using the first to split your
> words into bytes, and then flipping them before concatenating. The Xilinx
> "bitbasher" block would also be good for this, using the Verilog (for a
> 64-bit input):
>
> out = {in[7:0], in[15:8], in[23:16], in[31:24], in[39:32], in[47:40],
> in[55:48], in[63:48]}
>
> If your 64 bit data streams are not made up of 64-bit integers (eg, they
> are pairs of 32-bit integers) then you should flip the 4 bytes of each
> value individually, but leave the ordering of the two values within the 64
> bits unchanged.
>
> Hopefully that makes sense
>
> Jack
>
>
> On Tue, 18 Aug 2020 at 13:28, Nitish Ragoomundun <
> nitish.ragoomun...@gmail.com> wrote:
>
>>
>> Hello,
>>
>> We are setting up the digital back-end of a low-frequency telescope
>> consisting of SNAP boards and GPUs. The SNAP boards packetize the data and
>> send to the GPU processing nodes via 10 GbE links. We are currently
>> programming the packetizer/depacketizer.
>> I have a few questions about the 10gbe yellow blocks and endianness. We
>> observed from the tutorials that the data stored in bram is big-endian. I
>> would like to know how the data is handled by the 10gbe and in what form is
>> it sent over the network.
>> Our depacketizers run on Intel processors, which are little-endian. We
>> are aware that network byte order is big-endian, but we noticed that
>> integer data can be sent from one Intel machine to another via network
>> without ever calling ntohl( ) or htonl( ) and the data was preserved. So,
>> we would like to know if we need to correct the endianness when receiving
>> the data from the SNAP.
>>
>> If we need to perform this correction, is there a way we could possibly
>> correct the endianness on the FPGA itself before input to the 10gbe block?
>>
>> Thanks,
>> Nitish
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to casper+unsubscr...@lists.berkeley.edu.
>> To view this discussion on the web visit
>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAC6X4cOZhVBUvUfs1phQ2csuRnewowZkQ8PzjjBU62LUa0js%3Dw%40mail.gmail.com
>> 
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG1GKSkaEebTXH0X6xiDt6DLHiEy7WUW5xb%3D7w%2BYUJHB3GB7-w%40mail.gmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAC6X4cMNM0hC1oOwq3mHT%3DO8teO_MGKy9sjUnbKC2fJH2yNUPg%40mail.gmail.com.


Re: [casper] SNAP FPGA data endianness and networking

2020-08-18 Thread Jack Hickish
Hi Nitish,

To try and answer your first question without adding confusion --

If you send a UFix64_0 value into the 10GbE block, you will need to
interpret it on the other end via an appropriate 64-bit byte swap if your
CPU is little-endian.
If you send a 64-bit input into the 10GbE block where the most significant
32 bits are the value A, and the least significant bits are value B, you
should interpret the 64-bits  on your little endian CPU as the struct

typedef struct pkt {
  uint32_t A;
  uint32_t B;
} pkt;

where each of the A and B will need byteswapping before you use them.

To answer your second question --
Yes, you can absolutely flip the endianness on the FPGA prior to
transmission so you don't have to byteswap on your CPU. You can either do
this with a bus-expand + bus-create blocks, using the first to split your
words into bytes, and then flipping them before concatenating. The Xilinx
"bitbasher" block would also be good for this, using the Verilog (for a
64-bit input):

out = {in[7:0], in[15:8], in[23:16], in[31:24], in[39:32], in[47:40],
in[55:48], in[63:48]}

If your 64 bit data streams are not made up of 64-bit integers (eg, they
are pairs of 32-bit integers) then you should flip the 4 bytes of each
value individually, but leave the ordering of the two values within the 64
bits unchanged.

Hopefully that makes sense

Jack


On Tue, 18 Aug 2020 at 13:28, Nitish Ragoomundun <
nitish.ragoomun...@gmail.com> wrote:

>
> Hello,
>
> We are setting up the digital back-end of a low-frequency telescope
> consisting of SNAP boards and GPUs. The SNAP boards packetize the data and
> send to the GPU processing nodes via 10 GbE links. We are currently
> programming the packetizer/depacketizer.
> I have a few questions about the 10gbe yellow blocks and endianness. We
> observed from the tutorials that the data stored in bram is big-endian. I
> would like to know how the data is handled by the 10gbe and in what form is
> it sent over the network.
> Our depacketizers run on Intel processors, which are little-endian. We are
> aware that network byte order is big-endian, but we noticed that integer
> data can be sent from one Intel machine to another via network without ever
> calling ntohl( ) or htonl( ) and the data was preserved. So, we would like
> to know if we need to correct the endianness when receiving the data from
> the SNAP.
>
> If we need to perform this correction, is there a way we could possibly
> correct the endianness on the FPGA itself before input to the 10gbe block?
>
> Thanks,
> Nitish
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAC6X4cOZhVBUvUfs1phQ2csuRnewowZkQ8PzjjBU62LUa0js%3Dw%40mail.gmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG1GKSkaEebTXH0X6xiDt6DLHiEy7WUW5xb%3D7w%2BYUJHB3GB7-w%40mail.gmail.com.


[casper] SNAP FPGA data endianness and networking

2020-08-18 Thread Nitish Ragoomundun
Hello,

We are setting up the digital back-end of a low-frequency telescope
consisting of SNAP boards and GPUs. The SNAP boards packetize the data and
send to the GPU processing nodes via 10 GbE links. We are currently
programming the packetizer/depacketizer.
I have a few questions about the 10gbe yellow blocks and endianness. We
observed from the tutorials that the data stored in bram is big-endian. I
would like to know how the data is handled by the 10gbe and in what form is
it sent over the network.
Our depacketizers run on Intel processors, which are little-endian. We are
aware that network byte order is big-endian, but we noticed that integer
data can be sent from one Intel machine to another via network without ever
calling ntohl( ) or htonl( ) and the data was preserved. So, we would like
to know if we need to correct the endianness when receiving the data from
the SNAP.

If we need to perform this correction, is there a way we could possibly
correct the endianness on the FPGA itself before input to the 10gbe block?

Thanks,
Nitish

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAC6X4cOZhVBUvUfs1phQ2csuRnewowZkQ8PzjjBU62LUa0js%3Dw%40mail.gmail.com.